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CAC ACA TCT GTG TCC TCC GCC AAC GTC TTC 
TGC AGC ACC TCC TGT GAC CAG CCC ACC TTG 
GAG TTG CTC CCG GGT GGG AAC AAC TGG AAG 
AGC CAA CCA ATG TGC TAT TCA AAC TGC CCT 
ACC GTG TAC TGG ACT CCA GAA CGG GTG GAA 
GGC AAG AAC CTT ACC CTA CGC TGC CAG GTG 
GTG GTA TTG CTC CGT GGG GAG GAG GAG CTG 
GAG GTC ACG GCC ACG GTG CTG GCG AGG AAA 
ACT GAA CTG GAC CTG CGG CCC CAA GGG CTG 
CAG CTC CAA ACC TTT GTC CTG CCA GCG ACT 
GAG GTG GAC ACG CAG GGG ACC GTG GTC TGT 
GCC CAG GTC CAC TTG GCA CTG GGG GAC CAG 
GAC TCC CTC TCG GCC AAG GCC TCA GTC AGT 
CTG TGG TGT GCA GTG ATA CTG AGG AAC CAG 
TAC AGC TTT CCT GCA CCC AAC GTG ACT CTG 
GTG ATA GTG AAG TGT GAG GCC CAC CCT GCA 
CAG CCG CCG GGC CCG AGG GCC CAG TTC CTG 
AGC TTC TCC TGC TCT GCA ACC CTG GAG GTG 
CGG GAG CTT CGA GTC CTG TAT GGC CCC CGA 
ACG TGG CCA GAA AAC TCC CAG CAG ACT CCA 
GAG CTC AAG TGT CTA AAG GAT GGC ACT TTC 
ACT CGA GAT CTT GAG GGC ACC TAC CTC TGT 
CGC GAG GTG ACC GTG AAT GTG CTC TCC CCC 
GCA GCC GCA GCC ATA CTG GGC ACT GCA GGC 
AAG ATC AGG ATA TAC AGA CTA CAA CAG GCT 
CAA ACC ACG CCT CCC 
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functional effect indicates a correlation between the evolutionary significant nucleotide 
change and the physiological condition. 

62. A method for identifying a target site which may be suitable for therapeutic 
intervention, comprising comparing a non-human polypeptide encoded in a polynucleotide 
sequence identified in the method of claim 19 with a corresponding human polypeptide, 
wherein a location of a molecular difference, if any, indicates a target site. 

63. A method for identifying a target site which may be suitable for therapeutic 
intervention, comprising comparing a human polypeptide encoded in a polynucleotide 
sequence identified in the method of claim 1 with a corresponding non-human polypeptide, 
wherein a location of a molecular difference, if any, indicates a target site. 
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polypeptide encoded within a polynucleotide sequence identified in claim 19, or a 
composition comprising said polypeptide, wherein an agent is identified by its ability to 
modulate function of the polypeptide sequence. 

5 57. An agent identified in the method of claim 56. 

58. A method for correlating an evolutionary significant human nucleotide 
change to a physiological condition in a human, comprising: 

Analyzing a functional effect, if any, of a polynucleotide sequence identified in 
10 claim 1 in a suitable model system, wherein presence of a functional effect indicates a 

correlation between the evolutionary significant nucleotide change and the physiological 
condition. 

59. A method for correlating an evolutionary significant human nucleotide 
1 5 change to a physiological condition in a human, comprising: 

Analyzing a functional effect, if any, of a polynucleotide sequence identified in 
claim 19 in a suitable model system, wherein presence of a functional effect indicates a 
correlation between the evolutionary significant nucleotide change and the physiological 
condition. 

20 

60. A method for correlating an evolutionarily significant human nucleotide 
change to a physiological condition in a human, comprising: 

Analyzing a functional effect, if any, of a polypeptide encoded in a polynucleotide 
sequence identified in claim 1 in a suitable model system, wherein presence of a functional 
25 effect indicates a correlation between the evolutionarily significant nucleotide change and 
the physiological condition. 

61 . A method for correlating an evolutionarily significant human nucleotide 
change to a physiological condition in a human, comprising: 

30 Analyzing a functional effect, if any, of a polypeptide encoded in a polynucleotide 

sequence identified in claim 19 in a suitable model system, wherein presence of a 
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b. identifying any nucleotide changes within the human sequences as 
compared to the homologous sequences from the non-human primate, wherein the changes 
are evolutionary significant. 

5 49. The method of claim 48, wherein the protein coding sequences are from 

brain. 

50. - A method of identifying an agent which may modulate a physiological 
condition, said method comprising contacting at least one agent to be tested with a cell that 

10 has been transfected with a polynucleotide sequence identified in claim 1, wherein an agent 
is identified by its ability to modulate function of the polynucleotide sequence. 

51. An agent identified by the method of claim 50. 

15 52. A method of identifying an agent which may modulate a physiological 

condition, said method comprising contacting at least one agent to be tested with a cell that 
has been transfected with a polynucleotide sequence identified in claim 19, wherein an 
agent is identified by its ability to modulate function of the polynucleotide sequence. 

20 53. An agent identified by the method of claim 52. 

54. A method of identifying an agent which may modulate a physiological 
condition, said method comprising contacting at least one agent to be tested with a 
polypeptide encoded within a polynucleotide sequence identified in claim 1, or a 

25 composition comprising said polypeptide, wherein an agent is identified by its ability to 
modulate function of the polypeptide sequence. 

55. An agent identified in the method of claim 54. 

30 56. A method of identifying an agent which may modulate a physiological 

condition, said method comprising contacting at least one agent to be tested with a 
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39. The method of claim 35, wherein the human brain protein-coding nucleotide 
sequences correspond to human brain cDNAs. 

40. The method of claim 35, wherein the nucleotide change is a non- 
synonymous substitution. 

41 . The method of claim 35, wherein the evolutionary significance of the 
nucleotide change is determined according to the non-synonymous substitution rate (K A ) of 
the nucleotide sequence. 

42. The method of claim 41, wherein the evolutionary significance of the 
nucleotide change is determined by the ratio of the non-synonymous substitution rate (K A ) 
to the synonymous rate (K s ) of the nucleotide sequence. 

43. The method of claim 42, wherein the K A /K S ratio is at least about 0.75. 

44. The method of claim 42, wherein the K A /K S ratio is at least about 1 .00. 

45. The method of claim 42, wherein the K A /K S ratio is at least about 1 .25. 

46. The method of claim 42, wherein the K A /K S ratio is at least about 1 .50. 

47. The method of claim 42, wherein the K A /K S ratio is at least about 2.00. 

48. A method for large-scale sequence comparison between human protein- 
coding polynucleotide sequences and protein-coding polynucleotide sequences from a non- 
human primate, comprising: 

a. aligning the human polynucleotide sequences with corresponding 
polynucleotide sequences from a non-human primate according to sequence homology; and 



69 



WO 99/39006 



PCT/US99/01964 



5 



29. The method of claim 28, wherein the evolutionary significance of the 
nucleotide change is determined by the ratio of the non-synonymous substitution rate (K A ) 
to the synonymous rate (K s ) of the nucleotide sequence. 



30. The method of claim 29, wherein the K A /K S ratio is at least about 0.75. 

3 1 . The method of claim 29, wherein the K A /K S ratio is at least about 1 .00. 
10 32. The method of claim 29, wherein the K A /K S ratio is at least about 1.25. 

33. The method of claim 29, wherein the K A /K S ratio is at least about 1 .50. 

34. The method of claim 29, wherein the K A /K S ratio is at least about 2.00. 

15 

35. The method for identifying an evolutionarily significant change in a human 
brain protein-coding polynucleotide sequence, comprising the steps of: 

a. comparing human brain protein-coding polynucleotide sequences to 
corresponding sequences of a non-human primate; and 
20 b. selecting a human polynucleotide sequence that contains a nucleotide 

change as compared to corresponding sequence of the non-human primate, wherein said 
change is evolutionarily significant. 

36. The method of claim 35, wherein the non-human primate is a member of the 
25 hominoid group. 

37. The method of claim 36, wherein the non-human primate is selected from 
the group consisting of chimpanzee, bonobo, gorilla, and orangutan. 

30 38. The method of claim 37, wherein the non-human primate is chimpanzee. 
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a. comparing human protein-coding polynucleotide sequences to 
protein-coding polynucleotide sequences of a non-human primate, wherein said non-human 
primate does not have the physiological condition; and 

b. selecting a human polynucleotide sequence that contains a nucleotide 
change as compared to the corresponding sequence of the non-human primate, wherein said 
change is evolutionarily significant. 
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10 21. The method of claim 1 9, wherein the physiological condition is a brain 

function. 

22. The method of claim 21, wherein the brain function is cognitive function. 

15 23. The method of claim 1 9, wherein the human protein coding sequence is 

associated with development of the physiological condition. 

24. The method of claim 1 9, wherein the non-human primate is a member of the 
hominoid group. 



20 



25 



25. The method of claim 24, wherein the non-human primate is selected from 
the group consisting of chimpanzee, bonobo, gorilla, and orangutan. 

26. The method of claim 25, wherein the non-human primate is chimpanzee. 

27. The method of claim 19, wherein the nucleotide change is a non- 
synonymous substitution. 



28. The method of claim 19, wherein the evolutionary significance of the 
30 nucleotide change is determined according to the non-synonymous substitution rate (K A ) of 
the nucleotide sequence. 
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9. The method of claim 8, wherein the non-human primate is selected from the 
group consisting of chimpanzee, bonobo, gorilla, and orangutan. 

i 0. The method of claim 9, wherein the non-human primate is chimpanzee. 

5 

1 1 . The method of claim 1 , wherein the nucleotide change is a non-synonymous 
substitution. 

1 2. The method of claim 1 , wherein the evolutionary significance of the 

10 nucleotide change is determined according to the non-synonymous substitution rate (K A ) of 
the nucleotide sequence. 

13. The method of claim 12, wherein the evolutionary significance of the 
nucleotide change is determined by the ratio of the non-synonymous substitution rate (K A ) 

15 to the synonymous rate (K s ) of the nucleotide sequence. 

14. The method of claim 13, wherein the K A /K S ratio is at least about 0.75. 

1 5. The method of claim 1 3, wherein the K A /K S ratio is at least about 1 .00. 

20 

16. The method of claim 13, wherein the K A /K S ratio is at least about 1 .25. 

1 7. The method of claim 13, wherein the K A /K S ratio is at least about 1 .50. 

25 1 8. The method of claim 1 3, wherein the K A /K S ratio is at least about 2.00. 

1 9. A method for identifying a human polynucleotide sequence encoding a 
polypeptide, wherein said polypeptide may be associated with a physiological condition in 
the human comprising the steps of: 
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CLAIMS 

What is claimed is: 

1 . A method for identifying a non-human primate polynucleotide sequence 

5 encoding a polypeptide, wherein said polypeptide may be associated with a physiological 
condition in the non-human primate, comprising the steps of: 

a. comparing non-human primate protein-coding polynucleotide 
sequences to protein-coding polynucleotide sequences of a human, wherein said human 
does not have the physiological condition; and 
10 b. selecting a non-human primate polynucleotide sequence that contains 

a nucleotide change as compared to the corresponding sequence of the human, wherein said 
change is evolutionarily significant. 

2. The method of claim 1, wherein the physiological condition is resistance to a 
15 disease. 



3. The method of claim 2, wherein the disease is cancer. 

4. The method of claim 2, wherein the disease is an infectious disease. 

5. The method of claim 4, wherein the infectious disease is a viral disease. 

6. The method of claim 5, wherein the viral disease is AIDS. 



20 



25 7. The method of claim 1 , wherein the non-human protein coding sequence is 

associated with development of the physiological condition. 

8. The method of claim I , wherein the non-human primate is a member of the 
hominoid group. 

30 
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those of ordinary skill in the art that certain changes and modifications can be practiced. 
Therefore, the description and examples should not be construed as limiting the scope of 
the invention, which is delineated by the appended claims. 
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processing, storage and retrieval capabilities, language abilities, as well as others. In this 
way, this approach provides a new strategy by which such genes can be identified. Lastly, 
the database not only provides an ordered collection of candidate genes, it also provides the 
precise molecular sequence differences that exist between human and chimpanzee (and 
5 other non-human primates), and thus defines the changes that underlie the functional 
differences. 

In some cases functional differences are evaluated in suitable model systems, 
including, but not limited to, in vitro analysis such as indicia of long term potentiation 
(LTP), and use of transgenic animals or other suitable model systems. These will be 
1 0 immediately apparent to those skilled in the art. 

EXAMPLE 14: Identification of Positive Selection in a Human Tyrosine Kinase Gene 

Using the methods of the present invention, a human gene (GenBank Acc.# 

ABO 14541), expressed in brain has been identified, that has been positively-selected as 
compared to its chimpanzee homologue. This gene, which codes for a tyrosine kinase, is 
homologous to a well-characterized mouse gene (GenBank Acc.# AF01 1908) whose gene 
product, called AATYK, is known to trigger apoptosis (Gaozza, E. et al. 1997 Oncogene 
15:31 27-3 1 35). The literature suggests that this protein controls apoptosis in the 
developing mouse brain (thus, in effect, "sculpting" the developing brain). The tyrosine 
kinase domain of this protein is highly conserved between mouse, chimpanzee, and human 
(as are most tyrosine kinases). Interestingly, however, the region of the protein to which 
signaling proteins bind has been positively-selected in humans, but strongly conserved in 
both chimpanzees and mice. The region of the human protein to which signaling proteins 
bind has not only been positively-selected as a result of point nucleotide mutation, but 
additionally displays duplication of several SH2 binding domains that exist only as single 
copies in mouse and chimpanzee. This suggests that a different set of signaling proteins 
may bind to the human protein, which could then trigger different pathways for apoptosis in 
the developing human brain compared to those in mice and chimpanzees. Such a gene thus 
may contribute to unique or enhanced human cognitive abilities. 

30 Although the foregoing invention has been described in some detail by way of 

illustration and example for purposes of clarity and understanding, it will be apparent to 
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above. This ratio, K A /Ks, has been shown to be a reflection of the degree to which adaptive 
evolution, i.e., positive selection, has been at work in the sequence under study. Typically, 
full-length coding regions have been used in these comparative analyses. However, partial 
segments of a coding region can also be used effectively. The higher the K A /K S ratio, the 
5 more likely that a sequence has undergone adaptive evolution. Statistical significance of 
K A /K S values is determined using established statistic methods and available programs such 
as the Mest. Those genes showing statistically high K A /K S ratios between chimpanzee and 
human genes are very likely to have undergone adaptive evolution. 

To further lend support to the significance of a high K A /K S ratio, the sequence under 

10 study can be compared in other non-human primates, e.g., gorilla, orangutan, bonobo. 

These comparisons allow further discrimination as to whether the adaptive evolutionary 
changes are unique to the human lineage compared to other non-human primates. The 
sequences can also be examined by direct sequencing of the gene of interest from 
representatives of several diverse human populations to assess to what degree the sequence 

15 is conserved in the human species. 

EXAMPLE 13: Further Seq ucnce Characterization 

Human brain nucleotide sequences containing evolutionarily significant changes are 
further characterized in terms of their molecular and genetic properties, as well as their 

20 biological functions. The identified coding sequences are used as probe to perform in situ 
mRNA hybridization that reveals the expression pattern of the gene, either or both in terms 
of what tissues and cell types in which the sequences are expressed, and when they are 
expressed during the course of development or during cell cycle. Sequences that are 
expressed in brain may be better candidates as being associated with important human brain 

25 functions. Moreover, the putative gene with the identified sequences are subjected to a 

homologue searching in order to determine what functional classes the sequences belong to. 

Furthermore, for some proteins, the identified human sequence changes may be 
useful in estimating the functional consequence of the change. By using such criteria a 
database of candidate genes can be generated. Candidates are ranked as to the likelihood 

30 that the gene is responsible for the unique or enhanced abilities found in the human brain 
compared to chimpanzee or other non-human primates, such as high capacity information 
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EXAMPLE 11: Sequence Comparison 

Randomly selected chimpanzee brain cDNA clones from the cDNA library are 
sequenced using an automated sequencer, such as the ABI 377. Commonly used primers 
5 on the cloning vector such as the Ml 3 Universal and Reverse primers are used to carry out 
the sequencing. For inserts that are not completely sequenced by end sequencing, dye- 
labeled terminators are used to fill in remaining gaps. 

The resulting chimpanzee sequences are compared to human sequences via database 
searches, e.g., BLAST searches. The high scoring "hits," i.e., sequences that show a 
10 significant (e.g., >80%) similarity after BLAST analysis, are retrieved and analyzed. The 
two homologous sequences are then aligned using the alignment program CLUSTAL V 
developed by Higgins et al. Any sequence divergence, including nucleotide substitution, 
insertion and deletion, can be detected and recorded by the alignment. 

The detected sequence differences are initially checked for accuracy by finding the 
15 points where there are differences between the chimpanzee and human sequences; checking 
the sequence fluorogram (chromatogram) to determine if the bases that appear unique to 
human correspond to strong, clear signals specific for the called base; checking the human 
hits to see if there is more than one human sequence that corresponds to a sequence change; 
and other methods known in the art as needed. Multiple human sequence entries for the 
20 same gene that have the same nucleotide at a position where there is a different chimpanzee 
nucleotide provides independent support that the human sequence is accurate, and that the 
chimpanzee/human difference is real. Such changes are examined using public database 
information and the genetic code to determine whether these DNA sequence changes result 
in a change in the amino acid sequence of the encoded protein. The sequences can also be 
25 examined by direct sequencing of the encoded protein. 

EXAMPLE 12: Molecular Evolution Analysis 

The chimpanzee and human sequences under comparison are subjected to K A /K S 
analysis. In this analysis, publicly available computer programs, such as Li 93 and INA, 
30 are used to determine the number of non-synonymous changes per site (K A ) divided by the 
number of synonymous changes per site (K s ) for each sequence under study as described 
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Using the methods of the present invention, a chimpanzee gene expressed in brain 
has been positively-selected (K A /K S =1.6) as compared to its human homologue (GenBank 
Acc. # X87176) has been identified. The human gene, 1 7-P hydroxysteroid dehydrogenase 
type IV, codes for a protein known to degrade the two most potent estrogens, P-estradiol, 
5 and 5-dioI (Adamski, J. et al 1 995 Biochem J. 311 :437-443). Estrogen-related cancers 
(including, for example, breast and prostate cancers) account for some 40% of human 
cancers. Interestingly, reports in the literature suggest that chimpanzees are resistant to 
tumorigenesis, especially those that are estrogen-related. This protein may have been 
positively-selected in chimpanzees to allow more efficient degradation of estrogens, thus 
10 conferring upon chimpanzees resistance to such cancers. If so, the specific amino acid 
replacements observed in the chimpanzee protein may supply important information for 
therapeutic intervention in human cancers. 

EXAMPLE 10: cDNA Library Construction 

15 A chimpanzee brain cDNA library is constructed using chimpanzee brain tissue. 

The chimpanzee brain tissue can be obtained after natural death so that no killing of animal 
is necessary for this study. In order to increase the chance of obtaining intact mRNAs 
expressed in brain, however, the brain is obtained as soon as possible after the animal's 
death. Preferably, the weight and age of the animal are determined prior to death. The 

20 brain tissue used for constructing a cDNA library is preferably the whole brain in order to 
maximize the inclusion of mRNA expressed in the entire brain. Brain tissue is dissected 
from the animal following standard surgical procedures. 

Total RNA is extracted from the brain tissue and the integrity and purity of the 
RNA are determined according to conventional molecular cloning methods. Poly A+ RNA 

25 is selected and used as template for the reverse-transcription of cDNA with oligo (dT) as a 
primer. The synthesized cDNA is treated and modified for cloning using commercially 
available kits. Recombinants are then packaged and propagated in a host cell line. Portions 
of the packaging mixes are amplified and the remainder retained prior to amplification. 
The library can be normalized and the numbers of independent recombinants in the library 

30 is determined. 
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our modeling studies to significantly change the positional angle between domains 2 and 3 
of human and chimpanzee ICAM-3. The human protein displays an acute angle at this 
juncture. Klickstein, et aL, 1996 J. Biol. Chem. 27:239 20-27. Loss of this sharp angle 
(bend) is predicted to render chimpanzee ICAM-3 less easily packaged into HIV-1 virions 
5 (In infected humans, after ICAMs are packaged into HIV virions, cell-to-cell infectivity 
dramatically increases. Barbeau, B. et aL 9 1998 J. Virol. 72:7125-7136). This failure to 
easily package chimp ICAM-3 into HIV virions could then prevent the increase in cell-to- 
ceii infeciiviiy seen in infected humans. This wouid then account for chimpanzee 
resistance to AIDS progression. 

10 A small molecule therapeutic intervention whereby binding of a suitably-designed 

small molecule to the human proline residue causes (as a result of steric hindrance) the 
human ICAM-1 protein to mimic the larger (i.e., less-acute) angle of chimpanzee ICAM-3 
is possible. Conservation between the 2 proteins of the critical binding residues (and the 
general resemblance of immune responses between humans and chimpanzees) argues that 

15 alteration of this angle will not compromise the basic function of human ICAM-3. 

However, the human ICAM-3 protein would be rendered resistant to packaging into HIV 
virions, thus mimicking (in HIV-1 infected humans) the postulated pathway by which 
infected chimpanzees resist progression to AIDS. 

EXAMPLE 8: Identifying Positive Selection of MIP-1 a 

20 

MIP-1 a is a chemokine that has been shown to suppress HIV-1 replication in 
human cells in vitro (Cocchi, F. et al, 1995 Science 270:181 1-1815). The chimpanzee 
homologue of the human MIP-1 a gene was PCR-amplified and sequenced. Calculation of 
the K A /K S ratio (2.1, P<0.05) and comparison to the gorilla homologue reveals that the 
25 chimpanzee gene has been positively-selected. As for the other genes discussed herein, the 
nature of the chimpanzee amino acid replacements is being examined to determine how to 
exploit the chimpanzee protein for therapeutic intervention. 

EXAMPLE 9: Identifying Positive Selection of 17-p-Hydroxysteroid Dehydrogenase 
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and an AIDS-resistant non-human primate (such as chimpanzee) and thus defines the 
changes that underlie the functional differences. 

EXAMPLE 7: Molecular Modeling of ICAM-1 and ICAM-3 

5 Modeling of the three-dimensional structure of ICAM-1 and ICAM-3 has provided 

additional evidence for the role of these proteins in explaining chimpanzee resistance to 
AIDS progression. 

In the case of ICAM-1, 5 of the 6 amino acid replacements that are unique to the 
chimpanzee lineage are immediately adjacent (i.e., physically touching) to those amino 

10 acids identified by mutagenic studies as critical to LFA-1 binding. These five amino acid 
replacements are human L18 to chimp Q18, human K29 to chimp D29, human P45 to 
chimp G45, human R49 to chimp W49, and human El 71 to chimp Q171. This positioning 
cannot be predicted from the primary structure (i.e., the actual sequence of amino acids). 
None of the amino acid residues critical for binding has changed in the chimpanzee ICAM- 

15 1 protein. 

Such positioning argues strongly that the chimpanzee ICAM-1 protein's basic 
function is unchanged between humans and chimpanzees; however, evolution has wrought 
fine-tuned changes that may help confer upon chimpanzees their resistance to progression 
of AIDS. The nature of the amino acid replacements is being examined to allow 

20 exploitation of the three-dimensional structural information for developing agents for 

therapeutic intervention. Strikingly, 4 of the 5 chimpanzee residues are adjacent to critical 
binding residues that have been identified as N-linked glycosylation sites. This suggests 
that differences exist in binding constants (to LFA-1) for human and chimpanzee ICAM-1. 
These binding constants are being determined. Should the binding constants prove lower in 

>5 chimpanzee ICAM- 1 , it is possible to devise small molecule agents to mimic (by way of 
steric hindrance) the change in binding constants as a potential therapeutic strategy for 
HIV-infected humans. Similarly, stronger binding constants, if observed for chimpanzee 
ICAM-1, will suggest alternative strategies for developing therapeutic interventions for 
HIV-1 infected humans. 

to In the case of ICAM-3, a critical amino acid residue replacement from proline 

(observed in seven humans) to glutamine (observed in three chimpanzees) is predicted from 
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example, an ELISA assay in which rabbit anti-p24 antibodies are used as capture antibody, 
biotinylated rabbit anti-p24 antibodies serve as detection antibody, and the assay is 
developed with avidin-horse radish peroxidase. To determine the number of infected cells, 
any known method, including indirect immunofluorescence methods, can be used. In 
5 indirect immunofluorescence methods, human HIV-positive serum can be used as a source 
of anti-HIV antibodies to bind to infected cells. The bound antibodies can be detected 
using FITC-conjugated anti-human IgG, the cells visualized by fluorescence microscopy 
and counted. 

Another method for assessing the role of a molecule such as ICAM-1 (or ICAM-2 

10 or ICAM-3) involves successive infection of cells with HIV. Human cell lines, preferably 
those that do not express endogenous ICAM (although cell lines that do express 
endogenous ICAM may also be used), are transfected with either human or chimpanzee 
ICAM -1 or -2 or -3. In one set of experiments, HIV is collected from the supernatant of 
HIV-infected human ICAM-1 (or ICAM-2 or ICAM-3)-expressing cells and used to infect 

15 chimpanzee ICAM-1 (or ICAM-2 or ICAM-3)-expressing cells or human ICAM-1 (or 

ICAM-2 or ICAM-3)-expressing cells. Initial infectivity, measured as described above, of 
both the chimpanzee ICAM-1 (or ICAM-2 or ICAM-3)- and the human ICAM-1 (or 
ICAM-2 or ICAM-3)-expressing cells would be expected to be high. After several rounds 
of replication, cell to cell infectivity would be expected to decrease in the chimpanzee 

20 ICAM-1 (or ICAM-2 or ICAM-3) expressing cells, if chimpanzee ICAM-1 (or ICAM-2 or 
ICAM-3) confers resistance. In a second set of experiments, HIV is collected from the 
supernatant of HIV-infected chimpanzee ICAM-1 (or ICAM-2 or ICAM-3 )-expressing 
cells, and used to infect human ICAM-1 (or ICAM-2 or ICAM-3)-expressing cells. In this 
case, the initial infectivity would be expected to be much lower than in the first set of 

25 experiments, if ICAM-1 (or ICAM-2 or ICAM-3) is involved in susceptibility to HIV 
progression. After several rounds of replication, the cell to cell infectivity would be 
expected to increase. 

The identified human sequences can be used in establishing a database of candidate 
human genes that may be involved in conferring, or contributing to, AIDS susceptibility or 
30 resistance. Moreover, the database not only provides an ordered collection of candidate 

genes, it also provides the precise molecular sequence differences that exist between human 
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vector. To obtain maximal expression, the cloned ICAM-1 (or ICAM-2 or ICAM-3) 
coding region is operably linked to a promoter which is active in human T cells, such as, 
for example, an IL-2 promoter. Alternatively, an ICAM-1 (or ICAM-2 or ICAM-3) cDNA 
can be placed under transcriptional control of a strong constitutive promoter, or an 
inducible promoter. Expression systems are well known in the art, as are methods for 
introducing an expression vector into cells. For example, an expression vector comprising 
an ICAM-1 (or ICAM-2 or ICAM-3) cDNA can be introduced into cells by DEAE-dextran 
or by electroporation, or any other known method. The cloned ICAM-1 (or ICAM-2 or 
ICAM-3) molecule is then expressed on the surface of the cell. Determination of whether 
an ICAM-1 (or ICAM-2 or ICAM-3) cDNA is expressed on the cell surface can be 
accomplished using antibody(ies) specific for ICAM-1 (or ICAM-2 or ICAM-3). In the 
case of chimpanzee ICAM-1 (or ICAM-2 or ICAM-3) expressed on the surface of human T 
cells, an antibody which distinguishes between chimpanzee and human ICAM-1 (or ICAM- 
2 or ICAM-3) can be used. This antibody can be labeled with a detectable label, such as a 
fluorescent dye. Cells expressing chimpanzee ICAM-1 (or ICAM-2 or ICAM-3) on their 
surfaces can be detected using fluorescence-activated cell sorting and the anti-ICAM-1 (or 
ICAM-2 or ICAM-3) antibody appropriately labeled, using well-established techniques. 

Transfected human cells expressing chimpanzee ICAM-1 (or ICAM-2 or ICAM-3) 
on their cell surface can then be tested for syncytia formation, and/or for HIV replication, 
and/or for number of cells infected as an index of cell-to-cell infectivity. The chimpanzee 
ICAM-1 (or ICAM-2 or ICAM-3)-expressing cells can be infected with HIV-1 at an 
appropriate dose, for example tissue culture infectious dose 50, i.e., a dose which can infect 
50% of the cells. Cells can be plated at a density of about 5 x 10 5 cells/ml in appropriate 
tissue culture medium, and, after infection, monitored for syncytia formation, and/or viral 
replication, and/or number of infected cells in comparison to control, uninfected cells. 
Cells which have not been transfected with chimpanzee ICAM-1 (or ICAM-2 or ICAM-3) 
also serve as controls. Syncytia formation is generally observed in HIV-1 -infected cells 
(which are not expressing chimpanzee ICAM-1 (or ICAM-2 or ICAM-3)) approximately 10 
days post-infection. 

To monitor HIV replication, cell supernatants can be assayed for the presence and 
amount of p24 antigen. Any assay method to detect p24 can be used, including, for 
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Nature 362:359-362; Pantaleo et al (1993) Nature 362:355-358. Syncytia may simply be 
scavenged from the body too quickly to be detected. Fouchier et al (1996) Virology 
219:87-95. Syncytia-mediated loss of CD4 + cells in vivo has been speculated to occur; this 
could contribute directly to compromise of the immune system, leading to opportunistic 
5 infection and full-blown AIDS. Sodrosky et al (1986) Nature 322:470-474; Hildreth and 
Orentas (1 989) Science 244:1075-1078; Finkel and Banda (1994). Thus critical changes in 
chimpanzee ICAM-1 , ICAM-2 or ICAM-3 may deter syncytia formation in chimpanzee 
and help explain chimpanzee resistance to AIDS progression. Because of the 
poly functional nature of ICAMs, these positively selected changes in the ICAM genes may 
10 additionally confer resistance to other infectious diseases or may play a role in other 
inflammatory processes that may also be of value in the development of human 
therapeutics. The polypeptide sequence alignments of ICAM-1, -2, and -3 are shown in 
Figures 5, 6, and 7, respectively. 

EXAMPLE 6: Characterization of ICAM-1, ICAM-2 and ICAM-3 positively selected 
15 sequences 

A sequence identified by the methods of this invention may be further tested and 
characterized by cell transfection experiments. For example, human cells in culture, when 
transfected with a chimpanzee polynucleotide identified by the methods described herein 
(such as ICAM-1 (or ICAM-2 or ICAM-3); see below), could be tested for reduced viral 

20 dissemination and/or propagation using standard assays in the art, and compared to control 
cells. Other indicia may also be measured, depending on the perceived or apparent 
functional nature of the polynucleotide/polypeptide to be tested. For example, in the case 
of ICAM-1 (or ICAM-2 or ICAM-3), syncytia formation may be measured and compared 
to control (untransfected) cells. This would test whether the resistance arises from 

25 prevention of syncytia formation in infected cells. 

Cells which are useful in characterizing sequences identified by the methods of this 
invention and their effects on cell-to-cell infection by HIV-1 are human T-cell lines which 
are permissive for infection with HIV-1, including, H9 and HUT78 cell lines, which 
are available from the ATCC. 

30 For cell transfection assays, ICAM-1 (or ICAM-2 or ICAM-3) cDNA (or any cDNA 

identified by the methods described herein) can be cloned into an appropriate expression 
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suggesting that the ICAM-l/LFA-1 interaction has been subjected to unusually strong 
selective pressures. 



Table 3, K A /K S Ratios: Domains D1+D2 of ICAM-1 



Species Compared 


K A /K S Ratio 


Chimpanzee to Human 


3.1 (P<0.01) 


Chimpanzee to Gorilla 


2.5 (P < 0.05) 


Chimpanzee to Orangutan 


1.5 (P < 0.05) 


Human to Gorilla 


1.0 


Human to Orangutan 


0.90 


Gorilla to Orangutan 


1.0 



Essentially the same procedures were used to identify as positively selected 
chimpanzee ICAM-2 and ICAM-3 (see Table 4). The ligand binding domain of ICAM-1 
has been localized as exhibiting especially striking positive selection in contrast to ICAMs - 
2 and -3, for which positive selection resulted in amino acid replacements throughout the 
protein. Thus, this comparative genomic analysis reveals that positive selection on ICAMs 
in chimpanzees has altered the proteins' primary structure, for example, in important 
binding domains. These alterations may have conferred resistance to AIDS progression in 
chimpanzees. 



Table 4. K A /K S Ratios: ICAM-2 and 3 Whole Protein Comparisons 



Species Compared 


K A /K S Ratio 


Chimpanzee to Human ICAM-2 
Chimpanzee to Human ICAM-3 


2.1 (P<0.01) 
3.7 (P<0.01) 



Binding of ICAM-1, -2, and -3 has been demonstrated to play an essential role in 
the formation of syncytia (i.e., giant, multi-nucleated cells) in HIV-infected cells in vitro. 
Pantaleo el al. (1 991 ) J. Ex. Med. 173:51 1-514. Syncytia formation is followed by the 
depletion of CD* cells in vitro. Pantaleo et al. (1991); Levy (1993) Microbiol. Rev. 57:183- 
1 89; Butini et al. (1994) Eur. J. Immunol. 24:2191-2195; Finkel and Banda (1994) Curr. 
Opin. Immunol. 6:605-61 5. Although syncytia formation is difficult to detect in vivo, 
clusters of infected cells are seen in lymph nodes of infected individuals. Pantaleo et al., 
(1993) N. Eng. J. Med 328:327-335; Finkel and Banda (1994); Embretson et al. (1993) 
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to demonstrate that chimpanzee ICAM-1 has been positively-selected, the orangutan 
ICAM-1 was compared as well, since the postulated historical range of gorillas in Africa 
suggests that gorillas could have been exposed to the HIV-1 virus. Nowak and Paradiso 
(1 983) Walker 's Mammals of the World (Baltimore, MD, The Johns Hopkins University 
5 Press). The orangutan, however, has always been confined to Southeast Asia and is thus 
unlikely to have been exposed to HIV over an evolutionary time frame. (Nowak and 
Paradiso, 1983) (Gorillas are most closely-related to humans and chimpanzees, while 
orangutans are more distantly-related.) 



Table 2. K A /K S Ratios: ICAM-1 Whole Protein Comparisons 



Species Compared 


K A /K S Ratio 


Chimpanzee to Human 


2.1 (P<0.01) 


Chimpanzee to Gorilla 


1.9 (P < 0.05) 


Chimpanzee to Orangutan 


1.4 (P< 0.05) 


Human to Gorilla 


1.0 


Human to Orangutan 


0.87 


Gorilla to Orangutan 


0.95 



10 

Even among those proteins for which positive selection has been demonstrated, few 
show K A /K S ratios as high as these ICAM-1 comparisons. Lee and Vacquier (1992) Biol 
Bull. 182:97-104; Swansonand Vacquier (1995) Proc. Natl. Acad Sci. USA 92:4957-4961; 
Messier and Stewart (1997); Sharp (1997) Nature 385:1 1 1-1 12. The results are consistent 
15 with strong selective pressure resulting in adaptive changes in the chimpanzee ICAM-1 
molecule. 

The domains (Dl and D2) of the ICAM-1 molecule which bind to LFA-1 have been 
documented. Staunton ei al. (1990). Cell 61 :243-254. Pairwise K A /K S comparisons 
between primate ICAM-1 genes. K A /K S ratios were calculated for domains Dl and D2 
20 only, using the algorithm of Li (1985; 1993) (Table 3). Statistically significant 

comparisons (determined by /-tests) are shown in bold. The very high, statistically 
significant K A /K S ratios for domains Dl and D2 suggest that these regions of the protein 
were very strongly positively-selected. These regions of chimpanzee ICAM-1 display even 
more striking K A /K S ratios (Table 3) than are seen for the whole protein comparisons, thus 
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The ICAM proteins, members of the Ig superfamily, are ligands for the integrin leukocyte 
associated function 1 molecule (LFA-1). Makgoba et al (1988) Nature 331:86-88. LFA-1 
is expressed on the surface of most leukocytes, while ICAMs are expressed on the surface 
of both leukocytes and other cell types. Larson et al (1989) 1 Cell Biol 108:703-712. 
ICAM and LFA-1 proteins are involved in several immune response interactions, including 
T-cell function, and targeting of leukocytes to areas of inflammation. Larson et al (1989). 

Total RNA was prepared using either the RNeasy kit (Qiagen), or the RNAse-free 
Rapid Total RNA kit (5 Prime - 3 Prime, Inc.) from primate tissues (chimpanzee brain and 
blood, gorilla blood and spleen, orangutan blood) or from cells harvested from the 
following B lymphocyte cell lines: CARL (chimpanzee), ROK (gorilla), and PUTI 
(orangutan). mRNA was isolated from total RNA using the Mini-Oligo(dT) Cellulose Spin 
Columns (5 Prime - 3 Prime, Inc.). cDNA was synthesized from mRNA with oligo dT 
and/or random priming using the cDNA Synthesis Kit (Stratagene). The protein-coding 
region of the primate ICAM-1 gene was amplified from cDNA using primers 
(concentration^ 00 nmole/fil) designed by hand from the published human sequence. PCR 
conditions for ICAM-1 amplification were 94°C initial pre-melt (4 min), followed by 35 . 
cycles of 94°C (15 sec), 58°C (1 min 15 sec), 72°C (1 min 15 sec), and a final 72°C 
extension for 10 minutes. PCR was accomplished using Ready-to-Go PCR beads 
(Amersham Pharmacia Biotech) in a 50 microliter total reaction volume. Appropriately- 
sized products were purified from agarose gels using the QiaQuick Gel Extraction kit 
(Qiagen). Both strands of the amplification products were sequenced directly using the Big 
Dye Cycle Sequencing Kit and analyzed on a 373 A DNA sequencer (ABI BioSystems). 

Comparison of the protein-coding portions of the human, gorilla (Gorilla gorilla), 
and orangutan (Pongo pygmaeus) ICAM-1 genes to that of the chimpanzee yielded 
statistically significant K A /K S ratios (Table 2). The protein-coding portions of the human 
and chimpanzee ICAM-1 genes were previously published and the protein-coding portions 
of gorilla (Gorilla gorilla), and orangutan (Pongo pygmaeus) ICAM-1 genes are shown in 
Figures 3 and 4, respectively. 

For this experiment, pairwise K A /K S ratios were calculated for the mature protein 
using the algorithm of Li (1985; 1993). Statistically significant comparisons (determined 
by /-tests) are shown in bold. Although the comparison to gorilla and human was sufficient 
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checking the human hits to see if there is more than one human sequence that corresponds 
to a sequence change; and other methods known in the art, as needed. Multiple human 
sequence entries for the same gene that have the same nucleotide at a position where there 
is a different chimpanzee nucleotide provides independent support that the human sequence 
is accurate, and that the chimpanzee/human difference is real. Such changes are examined 
using public database information and the genetic code to determine whether these DNA 
sequence changes result in a change in the amino acid sequence of the encoded protein. 
The sequences can also be examined by direct sequencing of the encoded protein. 

EXAMPLE 3: Molecular Evolution Analysis 

The chimpanzee and human sequences under comparison are subjected to K A /K S 
analysis. In this analysis, publicly available computer programs, such as Li 93 and INA, 
are used to determine the number of non-synonymous changes per site (K A ) divided by the 
number of synonymous changes per site (K s ) for each sequence under study as described 
above. Full-length coding regions or partial segments of a coding region can be used. The 
higher the K A /K S ratio, the more likely that a sequence has undergone adaptive evolution. 
Statistical significance of K A /K S values is determined using established statistic methods 
and available programs such as the /-test. 

To further lend support to the significance of a high K A /K S ratio, the sequence under 
study can be compared in multiple chimpanzee individuals and in other non-human 
primates, e.g., gorilla, orangutan, bonobo. These comparisons allow further discrimination 
as to whether the adaptive evolutionary changes are unique to the human lineage compared 
to other non-human primates. The sequences can also be examined by direct sequencing of 
the gene of interest from representatives of several diverse human populations to assess to 
what degree the sequence is conserved in the human species. 

EXAMPLE 4: Identification of positively selected ICAM-1, ICAM-2 and ICAM-3 

Using the methods of the invention described herein, the intercellular adhesion 

molecules ICAM-1, ICAM-2 and ICAM-3 have been shown to have been strongly 
positively selected. The ICAM molecules are involved in several immune response 
interactions and are known to play a role in progression to AIDS in HIV infected humans. 
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variations are considered to fall within the scope of the invention as described and claimed 
herein. 

EXAMPLES 
EXAMPLE 1: cDNA Library Construction 

A chimpanzee cDNA library is constructed using chimpanzee tissue. 

Total RNA is extracted from the tissue (RNeasy kit, Quiagen; RNAse-free Rapid 
Total RNA kit, 5 Prime--3 Prime, Inc.) and the integrity and purity of the RNA are 
determined according to conventional molecular cloning methods. Poly A+ RNA is 
isolated (Mini-Oligo(dT) Cellulose Spin Columns, 5 Prime-3 Prime, Inc.) and used as 
template for the reverse-transcription of cDNA with oligo (dT) as a primer. The 
synthesized cDNA is treated and modified for cloning using commercially available kits.. 
Recombinants are then packaged and propagated in a host cell line. Portions of the 
packaging mixes are amplified and the remainder retained prior to amplification. The 
library can be normalized and the numbers of independent recombinants in the library is 
determined. 

EXAMPLE 2; Sequence Comparison 

Suitable primers based on a candidate human gene are prepared and used for PCR 
amplification of chimpanzee cDNA either from a cDNA library or from cDNA prepared 
from mRNA. Selected chimpanzee cDNA clones from the cDNA library are sequenced 
using an automated sequencer, such as an ABI 377. Commonly used primers on the 
cloning vector such as the M 1 3 Universal and Reverse primers are used to carry out the 
sequencing. For inserts that are not completely sequenced by end sequencing, dye-labeled 
terminators are used to fill in remaining gaps. 

The detected sequence differences are initially checked for accuracy, for example by 
finding the points where there are differences between the chimpanzee and human 
sequences; checking the sequence fluorogram (chromatogram) to determine if the bases that 
appear unique to human correspond to strong, clear signals specific for the called base; 
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sequences with at least one evolutionarily significant change between human and one 
AIDS-resistant non-human primate can be used as primers for PCR analysis of other non- 
human primate protein-coding sequences, and resulting polynucleotides are sequenced to 
see whether the same change is present in other non-human primates. These comparisons 
5 allow further discrimination as to whether the adaptive evolutionary changes are unique to 
the AIDS-resistant non-human primate (such as chimpanzee) as compared to other non- 
human primates. For example, a nucleotide change that is detected in chimpanzee but not 
other primates more likely represents positive selection on the chimpanzee gene. Other 
non-human primates used for comparison can be selected based on their phylogenetic 

10 relationships with human. Closely related primates can be those within the hominoid 

sublineage, such as chimpanzee, bonobo, gorilla, and orangutan. Non-human primates can 
also be those that are outside the hominoid group and thus not so closely related to human, 
such as the Old World monkeys and New World monkeys. Statistical significance of such 
comparisons may be determined using established available programs, e.g., /-test as used by 

15 Messier and Stewart (1997) Nature 385:151-154. 

Furthermore, sequences with significant changes can be used as probes in genomes 
from different humans to see whether the sequence changes are shared by more than one 
individual. For example, certain individuals are slower to progress to AIDS ("slow 
progressers") and comparison (a) between a chimpanzee sequence and the homologous 

20 sequence from the slow-progresser human individual and/or (b) between an AIDS- 
susceptible individual and a slow-progresser individual would be of interest. Gene 
sequences from different human populations can be obtained from databases made 
available by, for example, the human genome diversity project or, alternatively, from direct 
sequencing of PCR-amplified DNA from a number of unrelated, diverse human 

25 populations. The presence of the identified changes in human slow progressers would 
further indicate the evolutionary significance of the changes. 

The following examples are provided to further assist those of ordinary skill in the 
art. Such examples are intended to be illustrative and therefore should not be regarded as 
limiting the invention. A number of exemplary modifications and variations are described 

30 in this application and others will become apparent to those of skill in this art. Such 
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interferon a (IFN- a) 
cytokine 

CD44 

5 cell adhesion 

CD8 

glycoprotein 



10 



Genes already examined (12) 



15 



20 



25 



30 



Immune system 
Immune system 



ICAM-1 
ICAM-2 
ICAM-3 

Immune system 
leukocyte associated function 1 molecule a (LFA-1) 
leukocyte associated function 1 molecule p (LFA-1) 
Mac-1 a 

Immune system 
Mac-1 p (equivalent to LFA-1 p) 
system 
CXCR4 

CCR5 

MlP-la 

MIP-ip 

RANTES 



chemokine receptor 
chemokine receptor 
chemokine 
chemokine 
chemokine 



Immune system 
Immune system 



Immune 



35 Aligned protein-coding sequences of human and an AIDS resistant non-human 

primate such as chimpanzee are analyzed to identify nucleotide sequence differences at 
particular sites. The detected sequence changes are generally, and preferably, initially 
checked for accuracy as described above. The evolutionary significant nucleotide 
changes, which are detected by molecular evolution analysis such as theK A /K s analysis, can 

40 be further assessed to determine whether the non-human primate gene or the human gene 
has been subjected to positive selection. For example, the identified changes can be tested 
for presence/absence in other AIDS- resistant non-human primate sequences. The 
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CCR2B 

chemokine receptor 

CCR3 

chemokine receptor 

5 Bonzo 

chemokine receptor 

BOB 

chemokine receptor 

GPR1 

1 0 chemokine receptor 

stromal -derived factor- 1 (SDF-1) 
chemokine 

tumor-necrosis factor-a (TNF- a) PCD promoter 

TNF-receptor II (TNFRII) receptor 
1 5 interferon y (IFN- y) 
cytokine 
interleukin 1 a(IL-l a) 

cytokine 
interleukin lp(IL-l p) 
20 cytokine 

interleukin 2 (IL-2) 

cytokine 
interleukin 4 (IL-4) 
cytokine 
25 interleukin 6 (IL-6) 
cytokine 
interleukin 10(IL-10) 

cytokine 
interleukin 13 (IL-13) 
30 cytokine 
B7 

signaling protein 



macrophage colony-stimulating factor (M-CSF) cytokine 

granulocyte-macrophage colony-stimulating factor cytokine 
35 phosphatidyl inositol 3-kinase (PI 3-kinase) kinase 
phosphatidyl inositol 4-kinase (PI 4-kinase) kinase 



HLA class I a chain 

histocompatibility antigen 
P 2 microglobulin 
40 lymphocyte antigen 

CD55 

decay-accelerating factor 

CD59 

complement protein 

45 CD63 

glycoprotein antigen 

CD71 
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of: (a) comparing human protein coding sequences to protein-coding polynucleotide 
sequences of an infectious disease-resistant non-human primate, wherein the human protein 
coding sequence is associated with development of the infectious disease; and (b) selecting 
a human polynucleotide sequence that contains at least one nucleotide change as compared 
5 to the corresponding sequence of an infectious disease-resistant non-human primate, 
wherein the nucleotide change is evolutionarily significant. 

In the present invention, human sequences to be compared with a homologue from 
an AIDS-resistant non-human primate are selected based their known or implicated 
association with HIV propagation (i.e., replication), dissemination and/or subsequent 

10 progression to AIDS. Such knowledge is obtained, for example, from published literature 
and/or public databases (including sequence databases such as GenBank). Because the 
pathway involved in development of AIDS (including viral replication) involves many 
genes, a number of suitable candidates may be tested using the methods of this invention. 
Table 1 contains a exemplary list of genes to be examined. The sequences are generally 

15 known in the art. 

Table 1: Sample List of Human Genes to be/have been Examined 
Gene Function 

eIF-5A 

20 Initiation factor 

hPC6A 

hPC6B 

25 P56 Ick 



Protease 
Protease 



Signal transduction 
FK506-binding protein 

Immunophilin 
calnexin 

30 • ? 
Bax 

PCD promoter 

bcl-2 

apoptosis inhibitor 

35 lck 

tyrosine kinase 

MAPK (mitogen activated protein kinase) protein kinase 

CD43 

sialoglycoprotein 
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prevention and/or therapy for HIV infection progressing to AIDS. These differences could 
also be used in developing diagnostic reagents and/or biomedical research tools. For 
example, identification of proteins which confer resistance may allow development of 
diagnostic reagents or biomedical research tools based upon the disruption of the disease 
5 pathway of which the resistant protein plays a part. 

Generally, the methods described herein entail (a) comparing human protein-coding 
polynucleotide sequences to that of an AIDS resistant non-human primate (such as 
chimpanzee), wherein the human protein ceding polynucleotide sequence is associated with 
development of AIDS; and (b) selecting those human protein-coding polynucleotide 
10 sequences having evolutionary significant changes that may be associated with 

susceptibility to development of AIDS. In another embodiment, the methods entail (a) 
comparing human protein-coding polynucleotide sequences to that of an AIDS-resistant 
non-human primate (such as chimpanzee), wherein the human protein coding 
polynucleotide sequence is associated with development of AIDS; and (b) selecting those 
15 non-human primate protein-coding polynucleotide sequences having evolutionary 
significant changes that may be associated with resistance to development of AIDS. 

As is evident, the methods described herein can be applied to other infectious 
diseases. For example, the methods could be used in a situation in which a non-human 
primate is known or believed to have harbored the infectious disease for a significant period 
20 (i.e., a sufficient time to have allowed positive selection) and is resistant to development of 
the disease. Thus, in other embodiments, the invention provides methods for identifying a 
polynucleotide sequence encoding a polypeptide, wherein said polypeptide may be 
associated with resistance to development of an infectious disease, comprising the steps of: 
(a) comparing infectious disease-resistant non-human primate protein coding sequences to 
25 human protein coding sequences, wherein the human protein coding sequence is associated 
with development of the infectious disease; and (b) selecting an infectious disease-resistant 
non-human primate sequence that contains at least one nucleotide change as compared to 
the corresponding human sequence, wherein the nucleotide change is evolutionarily 
significant. In another embodiment, the invention provides methods for identifying a 
30 human polynucleotide sequence encoding a polypeptide, wherein said polypeptide may be 
associated with susceptibility to development of an infectious disease, comprising the steps 
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chimpanzee resistance to AIDS progression. Burnet and White (1972); Natural History of 
Infectious Disease (Cambridge, Cambridge Univ. Press,); Ewald (1991) Hum. Nat. 2(i);l- 
30. All references cited herein are hereby incorporated by reference in their entirety. 

One aspect of this invention arises from the observations that (a) because 
chimpanzees (Pan troglodytes) have displayed resistance to development of AIDS although 
susceptible to HIV infection (Alter et aL (1984) Science 226:549-552; Fultz et al (1986) J. 
Virol 58:1 16-124; Novembre et al (1997) 1 Virol 71(5):4086-4091), while humans are 
susceptible to developing this devastating disease, certain genes in chimpanzees may 
contribute to this resistance; and (b) it is possible to evaluate whether changes in human 
genes when compared to homologous genes from other species (such as chimpanzee) are 
evolutionary significant (i.e., indicating positive selective pressure). Thus, protein coding 
polynucleotides may contain sequence changes that are found in chimpanzees (as well as 
other AIDS-resistant primates) but not in humans, likely as a result of positive adaptive 
selection during evolution. Furthermore, such evolutionarily significant changes in 
polynucleotide and polypeptide sequences may be attributed to an AIDS-resistant non- 
human primate's (such as chimpanzee) ability to resist development of AIDS. The methods 
of this invention employ selective comparative analysis to identify candidate genes which 
may be associated with susceptibility or resistance to AIDS, which may provide new host 
targets for therapeutic intervention as well as specific information on the changes that 
evolved to confer resistance. Development of therapeutic approaches that involve host 
proteins (as opposed to viral proteins and/or mechanisms) may delay or even avoid the 
emergence of resistant viral mutants. The invention also provides screening methods using 
the sequences and structural differences identified. 

This invention provides methods for identifying human polynucleotide and 
polypeptide sequences that may be associated with susceptibility to post-infection 
development of AIDS. Conversely, the invention also provides methods for identifying 
polynucleotide and polypeptide sequences from an AIDS-resistant non-human primate 
(such as chimpanzee) that may be associated with resistance to development of AIDS. 
Identifying the genetic {i.e., nucleotide sequence) and the resulting protein structural and 
biochemical differences underlying susceptibility or resistance to development of AIDS 
will likely provide a basis for discovering and/or designing agents that can provide 
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Description of the AIDS Embodiment (an example of a positively selected non- 
human trait) 

The AIDS (Acquired Immune Deficiency Syndrome) epidemic has been estimated 
to threaten 30 million people world-wide (UNAIDS/WHO, 1998, "Report on the global 
5 HIV/AIDS epidemic"). Well over a million people are infected in developed countries, and 
in parts of sub-Saharan Africa, 1 in 4 adults now carries the virus (UNAIDS/WHO, 1998). 
Although efforts to develop vaccines are underway, near term prospects for successful 

vaccines are prim. Baiter and Cohen (1998^ Science 281 : 159-1 60: Baltimore and Heilman 

----- w v 

(1 998) Scientific Am. 279:98-1 03. Further complicating the development of therapeutics is 

10 the rapid mutation rate of HIV (the human immunodeficiency virus which is responsible for 
AIDS), which generates rapid changes in viral proteins. These changes ultimately allow 
the virus to escape current therapies, which target viral proteins. Dobkin (1998) Inf Med 
1 5(3): 1 59. Even drug cocktails which initially showed great promise are subject to the 
emergence of drug-resistant mutants. Baiter and Cohen (1998); Dobkin (1998). Thus, 

15 there is still a serious need for development of therapies which delay or prevent progression 
of AIDS in HIV-infected individuals. Chun et al (1997) Proa Natl Acad Sci. USA 
94:13193-13197; Dobkin (1998). 

Human's closest relatives, chimpanzees (Pan troglodytes), have unexpectedly 
proven to be poor models for the study of the disease processes following infection with 

20 HIV-1 . Novembre et al (1997); 1 Virol 71(5):4086-4091. Once infected with HIV-1, 
chimpanzees display resistance to progression of the disease. To date, only one 
chimpanzee individual is known to have developed full-blown AIDS, although more than 
100 captive chimpanzees have been infected. Novembre et al. (1997); Villinger et al. 
(1997) J. Med. Primatol 26(1 -2): 1 1-18. Clearly, an understanding of the mechanism(s) 

25 that confer resistance to progression of the disease in chimpanzees may prove invaluable 
for efforts to develop therapeutic agents for HIV-infected humans. 

It is generally believed that wild chimpanzee populations harbored the HIV-1 virus 
(perhaps for millennia) prior to its recent cross-species transmission to humans. Dube et 
al, (1994); Virology 202:379-389; Zhu and Ho (1995) Nature 374:503-504; Zhu et al 

30 (1998); Quinn (1994) Proc. Natl Acad Sci USA 91:2407-2414. During this extended 
period, viral/host co-evolution has apparently resulted in accommodation, explaining 
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provide a basis for designing agents that can modulate human brain functions and/or help 
correct functional defects. These sequences could also be used in developing diagnostic 
reagents and/or biomedical research tools. The invention also provides methods for a large- 
scale comparison of human brain protein-coding sequences with that from a non-human 
primate. 

The identified human sequence changes can be used in establishing a database of 
candidate human genes that may be involved in human brain function. Candidates are 
ranked as to the likelihood that the gene is responsible for the unique or enhanced 
functional capabilities found in the human brain compared to chimpanzee or other non- 
human primates. Moreover, the database not only provides an ordered collection of 
candidate genes, it also provides the precise molecular sequence differences that exist 
between human and chimpanzee (and other non-human primates), and thus defines the 
changes that underlie the functional differences. This information can be useful in the 
identification of potential sites on the protein that may serve as useful targets for 
pharmaceutical agents. 

Accordingly, the present invention also provides methods for correlating an 
evolutionary significant nucleotide change to a brain functional capability that is unique 
or enhanced in humans, comprising (a) identifying a human nucleotide sequence according 
to the methods described above; and (b) analyzing the functional effect of the presence or 
absence of the identified sequence in a model system. 

Further studies can be carried out to confirm putative function. For example, the 
putative function can be assayed in appropriate in vitro assays using transiently or stably 
transfected mammalian cells in culture, or using mammalian cells transfected with an 
antisense clone to inhibit expression of the identified polynucleotide to assess the effect of 
the absence of expression of its encoded polypeptide. Studies such as one-hybrid and two- 
hybrid studies can be conducted to determine, for example, what other macromolecules the 
polypeptide interacts with. Transgenic nematodes or Drosophila can be used for various 
functional assays, including behavioral studies. The appropriate studies depend on the 
nature of the identified polynucleotide and the polypeptide encoded within the 
polynucleotide, and would be obvious to those skilled in the art. 
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METHODS USEFUL FOR IDENTIFYING POSITIVELY SELECTED HUMAN 
TRAITS 

This invention specifically provides methods for identifying human polynucleotide 

5 and polypeptide sequences that may be associated with unique or enhanced functional 
capabilities of the human, for example, brain function or longer life span. More 
particularly, these methods identify those genetic sequences that may be associated with 
capabilities that are unique or enhanced in humans, including, but not limited to, brain 
functions such as high capacity information processing, storage and retrieval capabilities, 

10 creativity, and language abilities. Moreover, these methods identify those sequences that 
may be associated to other brain functional features with respect to which the human brain 
performs at enhanced levels as compared to other non-human primates; these differences 
may include brain-mediated emotional response, locomotion, pain/pleasure sensation, 
olfaction, temperament and longer life span 

15 In this method, the general methods of the invention are applied as described above. 

Generally, the methods described herein entail (a) comparing human protein-coding 
polynucleotide sequences to that of a non-human primate; and (b) selecting those human 
protein-coding polynucleotide sequences having evolutionary significant changes that 
may be associated with unique or enhanced functional capabilities of the human as 

20 compared to that of the non-human primate. 

In this embodiment, the human sequence includes the evolutionarily significant 
change (i.e., the human sequence differs from more than one non-human primate species 
sequence in a manner that suggests that such a change is in response to a selective 
pressure). The identity and function of the protein encoded by the gene that contains the 

25 evolutionarily significant change is characterized and a determination is made whether or 
not the protein can be involved in a unique or enhanced human function. If the protein is 
involved in a unique or enhanced human function, the information is used in a manner to 
identify agents that can supplement or otherwise modulate the unique or enhanced human 
function. 

30 As a non-limiting example of the invention, identifying the genetic (i.e., 

nucleotide sequence) differences underlying the functional uniqueness of human brain may 
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limited to, polynucleotides and polypeptides that have been positively selected in non- 
human primates, preferably chimpanzees, that may be associated with susceptibility or 
resistance to infectious diseases, cancer, or acne or may be associated with aesthetic 
conditions of interest to humans, such as hair growth or muscle mass. An example of this 
embodiment includes polynucleotides and polypeptides associated with the susceptibility or 
resistance to HIV progression to AIDS. The present invention can thus be useful in gaining 
insight into the molecular mechanisms that underlie resistance to HIV infection progressing 
to development of AIDS, providing information that can also be useful in discovering 
and/or designing agents such as drugs that prevent and/or delay development of AIDS. 
Commercially relevant examples include, but are not limited to, polynucleotides and 
polypeptides that are positively selected in non-human primates that may be associated 
with aesthetic traits, such as hair growth, acne, or muscle mass. 

Accordingly, in one aspect, the invention provides methods for identifying a 
polynucleotide sequence encoding a polypeptide, wherein said polypeptide may be 
associated with a medically relevant positive evolutionary significant change. The 
positive evolutionary significant change can be found in humans or in non-human 
primates, but the positively selected non-human primate evolutionary significant change 
will be described first herein. The method comprises the steps of: (a) comparing human 
protein-coding nucleotide sequences to protein-coding nucleotide sequences of a non- 
human primate; and (b) selecting a human polynucleotide sequence that contains at least 
one nucleotide change as compared to corresponding sequence of the non-human primate, 
wherein said change is evolutionary significant. The sequences identified by this method 
may be further characterized and/or analyzed for their possible association with biologically 
or medically relevant functions unique or enhanced in humans. 

Also provided in the present invention is a method for identifying a positive 
evolutionary significant change within human protein-coding nucleotide sequences, 
comprising the steps of: (a) comparing human protein-coding nucleotide sequences to 
corresponding sequences of a non-human primate; and (b) selecting a human 
polynucleotide sequence that contains at least one nucleotide change as compared to the 
corresponding sequence of the non-human primate, wherein said change is evolutionary 
significant. 
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It is also understood that the in vitro screening methods of this invention include 
structural, or rational, drug design, in which the amino acid sequence, three-dimensional 
atomic structure or other property (or properties) of a polypeptide provides a basis for 
designing an agent which is expected to bind to a polypeptide. Generally, the design and/or 
5 choice of agents in this context is governed by several parameters, such as side-by-side 

comparison of the structures of a human and homologous non-human primate polypeptides, 
the perceived function of the polypeptide target, its three-dimensional structure (if known 

JUllUl^u;, Ui.u V «*w. — K ~* v,*«*A MVJi 6 11> AWWlUMfcJWW VV/UlUIUUWVilU! 

chemistry can also be used to generate numerous permutations of candidate agents. 

10 Also contemplated in screening methods of the invention are transgenic animal 

systems, which are known in the art. 

The screening methods described above represent primary screens, designed to 
detect any agent that may exhibit activity that modulates the function of a polynucleotide or 
polypeptide. The skilled artisan will recognize that secondary tests will likely be necessary 

15 in order to evaluate an agent further. For example, a secondary screen may comprise 

testing the agent(s) in an infectivity assay using mice and other animal models (such as rat), 
which are known in the art. In addition, a cytotoxicity assay would be performed as a 
further corroboration that an agent which tested positive in a primary screen would be 
suitable for use in living organisms. Any assay for cytotoxicity would be suitable for this 

20 purpose, including, for example the MTT assay (Promega). 

The invention also includes agents identified by the screening methods described 

herein. 

METHODS USEFUL FOR IDENTIFYING POSITIVELY SELECTED NON- 
25 HUMAN TRAITS 

In one aspect of the invention, a non-human primate polynucleotide or polypeptide 
has undergone natural selection that resulted in a positive evolutionarily significant change 
(i.e., the non-human primate polynucleotide or polypeptide has a positive attribute not 
present in humans). In this aspect of the invention, the positively selected polynucleotide 
30 or polypeptide may be associated with susceptibility or resistance to certain diseases or 

with other commercially relevant traits. Examples of this embodiment include, but are not 
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modulation of polypeptide function can be accomplished in many ways including, but not 
limited to, the in vivo and in vitro assays listed above as well as in in vitro assays using 
protein preparations. Polypeptides can be extracted and/or purified from natural or 
recombinant sources to create protein preparations. An agent can be added to a sample of a 
5 protein preparation and the effect monitored; that is whether and how the agent acts on a 

polypeptide and affects its conformation, folding (or other physical characteristics), binding 
to other moieties (such as ligands), activity (or other functional characteristics), and/or 
other aspects of protein structure or functions is considered to have modulated polypeptide 
function. 

10 In an example for an assay for an agent that binds to a polypeptide encoded by a 

polynucleotide identified by the methods described herein, a polypeptide is first 
recombinantly expressed in a prokaryotic or eukaryotic expression system as a native or as 
a fusion protein in which a polypeptide (encoded by a polynucleotide identified as 
described above) is conjugated with a well-characterized epitope or protein. Recombinant 

15 polypeptide is then purified by, for instance, immunoprecipitation using appropriate 

antibodies or anti-epitope antibodies or by binding to immobilized ligand of the con jugate. 
An affinity column made of polypeptide or fusion protein is then used to screen a mixture 
of compounds which have been appropriately labeled. Suitable labels include, but are not 
limited to fluorochromes, radioisotopes, enzymes and chemiluminescent compounds. The 

20 unbound and bound compounds can be separated by washes using various conditions (e.g. 
high salt, detergent ) that are routinely employed by those skilled in the art. Non-specific 
binding to the affinity column can be minimized by pre-clearing the compound mixture 
using an affinity column containing merely the conjugate or the epitope. Similar methods 
can be used for screening for an agent(s) that competes for binding to polypeptides. In 

25 addition to affinity chromatography, there are other techniques such as measuring the 
change of melting temperature or the fluorescence anisotropy of a protein which will 
change upon binding another molecule. For example, a BIAcore assay using a sensor chip 
(supplied by Pharmacia Biosensor, Stitt et al. (1995) Cell 80: 661-670) that is covalently 
coupled to polypeptide may be performed to determine the binding activity of different 

30 agents. 
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half-life) of the polypeptide. The specificity of the agent with regard to the mRNA and 
polypeptide would be determined by examining the products in the absence of the agent 
and by examining the products of unrelated mRNAs and polypeptides. Methods to 
examine mRNA half-life, protein processing, and protein turn-over are well know to those 
5 skilled in the art. 

In vivo screening methods could also be useful in the identification of agents that 
modulate polypeptide function through the interaction with the polypeptide directly. Such 

"■£>wi 1 1. 0 wwuiu luvwjv iiwiiiiwi pv/ijr p^/puuw ii^ujju iiiiviavuuilJ, 11 CUJ.J , \Jl V1JJ1CU1WW KJl 

stabilize such interactions. Such agents could also alter a conformation of the polypeptide. 

10 The effect of the agent could be determined using immunoprecipitation reactions. 

Appropriate antibodies would be used to precipitate the polypeptide and any protein tightly 
associated with it. By comparing the polypeptides immunoprecipitated from treated cells 
and from untreated cells, an agent could be identified that would augment or inhibit 
polypeptide-ligand interactions, if any. Polypeptide-ligand interactions could also be 

15 assessed using cross-linking reagents that convert a close, but noncovalent interaction 

between polypeptides into a covalent interaction. Techniques to examine protein-protein 
interactions are well known to those skilled in the art. Techniques to assess protein 
conformation are also well known to those skilled in the art. 

It is also understood that screening methods can involve in vitro methods, such as 

20 cell-free transcription or translation systems. In those systems, transcription or translation 
is allowed to occur, and an agent is tested for its ability to modulate function. For an assay 
that determines whether an agent modulates the translation of mRNA or a polynucleotide, 
an in vitro transcription/translation system may be used. These systems are available 
commercially and provide an in vitro means to produce mRNA corresponding to a 

25 polynucleotide sequence of interest. After mRNA is made, it can be translated in vitro and 
the translation products compared. Comparison of translation products between an in vitro 
expression system that does not contain any agent (negative control) with an in vitro 
expression system that does contain an agent indicates whether the agent is affecting 
translation. Comparison of translation products between control and test polynucleotides 

30 indicates whether the agent, if acting on this level, is selectively affecting translation (as 
opposed to affecting translation in a general, non-selective or non-specific fashion). The 
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in the art. "Transfection" encompasses any method of introducing the endogenous 
sequence, including, for example, lipofection, transduction, infection or electroporation. 
The exogenous polynucleotide may be maintained as a non-integrated vector (such as a 
plasmid) or may be integrated into the host genome. 
5 To identify agents that specifically activate transcription, transcription regulatory 

regions could be linked to a reporter gene and the construct added to an appropriate host 
cell. As used herein, the term "reporter gene" means a gene that encodes a gene product 
that can be identified (i.e., a reporter protein). Reporter genes include, but are not limited 
to, alkaline phosphatase, chloramphenicol acetyltransferase, p-galactosidase, luciferase and 

10 green fluorescence protein (GFP). Identification methods for the products of reporter genes 
include, but are not limited to, enzymatic assays and fluorimetric assays. Reporter genes 
and assays to detect their products are well known in the art and are described, for example 
in Ausubel et al. (1987) and periodic updates. Reporter genes, reporter gene assays, and 
reagent kits are also readily available from commercial sources. Examples of appropriate 

15 cells include, but are not limited to, fungal, yeast, mammalian, and other eukaryotic cells. 
A practitioner of ordinary skill will be well acquainted with techniques for transfecting 
eukaryotic cells, including the preparation of a suitable vector, such as a viral vector; 
conveying the vector into the cell, such as by electroporation; and selecting cells that have 
been transformed, such as by using a reporter or drug sensitivity element. The effect of an 

20 agent on transcription from the regulatory region in these constructs would be assessed 
through the activity of the reporter gene product. 

Besides the increase in expression under conditions in which it is normally 
repressed mentioned above, expression could be decreased when it would normally be 
expressed. An agent could accomplish this through a decrease in transcription rate and the 

>5 reporter gene system described above would be a means to assay for this. The host cells to 
assess such agents would be need to be permissive for expression. 

Cells transcribing mRNA (from the polynucleotide of interest) could be used to 
identify agents that specifically modulate the half-life of mRNA and/or the translation of 
mRNA. Such cells would also be used to assess the effect of an agent on the processing 

10 and/or post-translational modification of the polypeptide. An agent could modulate the 
amount of polypeptide in a cell by modifying the turn-over (i.e., increase or decrease the 
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added to the assay system are ineffective to elicit at least one characteristic which is 
associated with modulation polynucleotide or polypeptide function, but that are modified 
by cellular components once inside a cell in such a way that they become effective agents; 
3) most importantly, an in vivo assay system allows identification of agents affecting any 
5 component of a pathway that ultimately results in characteristics that are associated with 
polynucleotide or polypeptide function. 

In general, screening can be performed by adding an agent to a sample of 
appropriate cells which have been transfected with a polynucleotide identified using the 
methods of the present invention, and monitoring the effect, i.e., modulation of a function 
10 of the polynucleotide or the polypeptide encoded within the polynucleotide. The 

experiment preferably includes a control sample which does not receive the candidate 
agent. The treated and untreated cells are then compared by any suitable phenotypic 
criteria, including but not limited to microscopic analysis, viability testing, ability to 
replicate, histological examination, the level of a particular RNA or polypeptide associated 
15 with the cells, the level of enzymatic activity expressed by the cells or cell lysates, the 

interactions of the cells when exposed to infectious agents, such as HIV, and the ability of 
the cells to interact with other cells or compounds. For example, the transfected cells can 
be exposed to the agent to be tested and, before, during, or after treatment with the agent, 
the cells can be infected with a virus, such as HIV, and tested for any indication of 

20 susceptibility of the cells to viral infection, including, for example, susceptibility of the 
cells to cell-to-cell viral infection, replication of the virus, production of a viral protein, 
and/or syncytia formation following infection with the virus. Differences between treated 
and untreated cells indicate effects attributable to the candidate agent. Optimally, the agent 
has a greater effect on experimental cells than on control cells. Appropriate host cells 

25 include, but are not limited to, eukaryotic cells, preferably mammalian cells. The choice of 
cell will at least partially depend on the nature of the assay contemplated. 

To test for agents that upregulate the expression of a polynucleotide, a suitable host 
cell transfected with a polynucleotide of interest, such that the polynucleotide is expressed 
(as used herein, expression includes transcription and/or translation) is contacted with an 

30 agent to be tested. An agent would be tested for its ability to result in increased expression 
of mRNA and/or polypeptide. Methods of making vectors and transfection are well known 
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act to modulate the expression of a polynucleotide include, but are not limited to 1) 
modifying binding of a transcription factor to a transcription factor responsive element in 
the polynucleotide; 2) modifying the interaction between two transcription factors 
necessary for expression of the polynucleotide; 3) altering the ability of a transcription 
5 factor necessary for expression of the polynucleotide to enter the nucleus; 4) inhibiting the 
activation of a transcription factor involved in transcription of the polynucleotide; 5) 
modifying a cell-surface receptor which normally interacts with a ligand and whose binding 
of the ligand results in expression of the polynucleotide; 6) inhibiting the inactivation of a 
component of the signal transduction cascade that leads to expression of the polynucleotide; 
10 and 7) enhancing the activation of a transcription factor involved in transcription of the 
polynucleotide. 

A "function" of a polypeptide includes, but is not limited to, conformation, folding 
(or other physical characteristics), binding to other moieties (such as ligands), activity (or 
other functional characteristics), and/or other aspects of protein structure or functions. For 

] 5 example, an agent that acts on a polypeptide and affects its conformation, folding (or other 
physical characteristics), binding to other moieties (such as ligands), activity (or other 
functional characteristics), and/or other aspects of protein structure or functions is 
considered to have modulated polypeptide function. The ways that an effective agent can 
act to modulate the function of a polypeptide include, but are not limited to 1) changing the 

20 conformation, folding or other physical characteristics; 2) changing the binding strength to 
its natural ligand or changing the specificity of binding to ligands; and 3) altering the 
activity of the polypeptide. 

Generally, the choice of agents to be screened is governed by several parameters, 
such as the particular polynucleotide or polypeptide target, its perceived function, its three- 

25 dimensional structure (if known or surmised), and other aspects of rational drug design. 

Techniques of combinatorial chemistry can also be used to generate numerous permutations 
of candidates. Those of skill in the art can devise and/or obtain suitable agents for testing. 

The in vivo screening assays described herein may have several advantages over 
conventional drug screening assays: 1 ) if an agent must enter a cell to achieve a desired 

30 therapeutic effect, an in vivo assay can give an indication as to whether the agent can enter a 
cell; 2) an in vivo screening assay can identify agents that, in the state in which they are 
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Screening methods 

The present invention also provides screening methods using the polynucleotides 
5 and polypeptides identified and characterized using the above-described methods. These 
screening methods are useful for identifying agents which may modulate the fiinction(s) of 
the polynucleotides or polypeptides in a manner that would be useful for a human 

#»■*■» Art* rion/arll l\/ tllO r^r\ C*+\~\nr\ C aniotl /»Art*/i/» + Ir»/T lex-trot ssvtn nsr**n + + Vkt» 4-t*r*4-exA »im+U 
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either a cell that has been transfected with a polynucleotide sequence identified by the 
10 methods described above, or a preparation of the polypeptide encoded by such 

polynucleotide sequence, wherein an agent is identified by its ability to modulate function 
of either the polynucleotide sequence or the polypeptide. 

As used herein, the term "agent" means a biological or chemical compound such as 
a simple or complex organic or inorganic molecule, a peptide, a protein or an 
15 oligonucleotide. A vast array of compounds can be synthesized, for example oligomers, 

such as oligopeptides and oligonucleotides, and synthetic organic and inorganic compounds 
based on various core structures, and these are also included in the term "agent". In 
addition, various natural sources can provide compounds for screening, such as plant or 
animal extracts, and the like. Compounds can be tested singly or in combination with one 
20 another. 

To "modulate function" of a polynucleotide or a polypeptide means that the 
function of the polynucleotide or polypeptide is altered when compared to not adding an 
agent. Modulation may occur on any level that affects function. A polynucleotide or 
polypeptide function may be direct or indirect, and measured directly or indirectly. A 

25 "function" of a polynucleotide includes, but is not limited to, replication, translation, and 
expression pattern(s). A polynucleotide function also includes functions associated with a 
polypeptide encoded within the polynucleotide. For example, an agent which acts on a 
polynucleotide and affects protein expression, conformation, folding (or other physical 
characteristics), binding to other moieties (such as ligands), activity (or other functional 

30 characteristics), regulation and/or other aspects of protein structure or function is 

considered to have modulated polynucleotide function. The ways that an effective agent can 
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timing of the gene expression during each stage of human development can also be 
determined. 

As another exemplary method of sequence characterization, the functional roles of 
the identified nucleotide sequences with significant changes can be assessed by conducting 
5 functional assays for different alleles of an identified gene in a model system, such as yeast, 
nematode, Drosophila, and mouse. Model systems may be cell-based or in vivo, such as 
transgenic animals. Preferably, the transgenic mouse system is used. Methods of making 
cell-based systems and/or transgenic animal systems are known in the art and need not be 
described in detail herein. 

10 As another exemplary method of sequence characterization, the use of computer 

programs allows modeling and visualizing the three-dimensional structure of the 
homologous proteins from human and chimpanzee. Specific, exact knowledge of which 
amino acids have been replaced in the chimpanzee protein(s) allows detection of structural 
changes that may be associated with functional differences. Thus, use of modeling 

15 techniques is closely associated with identification of functional roles discussed in the 

previous paragraph. The use of individual or combinations of these techniques constitutes 
part of the present invention. For example, chimpanzee ICAM-3 contains a glutamine 
residue (Q101) at the site in which human ICAM-3 contains a proline (PI 01). The human 
protein is known to bend sharply at this point. Replacement of the proline by glutamine in 

20 the chimpanzee protein is likely to result in a much less sharp bend at this point. This has 
clear implications for packaging of the ICAM-3 chimpanzee protein into HIV virions. 

The sequences identified by the methods described herein have significant uses in 
diagnosis and treatment of medically or commercially relevant human conditions. 
Accordingly, the present invention provides methods for identifying agents that are useful 

25 in modulating human-unique or human-enhanced functional capabilities and/or correcting 
defects in these capabilities using these sequences. These methods employ, for example, 
screening techniques known in the art, such as in vitro systems, cell-based expression 
systems and transgenic animal systems. The approach provided by the present invention not 
only identifies rapidly evolved genes, but indicates modulations that can be made to the 

30 protein that may not be too toxic because they exist in another species. 
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Stewart (1997) Nature 385:151-154. Those genes showing statistically highK A /K s ratios 
are very likely to have undergone adaptive evolution. 

Sequences with significant changes can be used as probes in genomes from different 
human populations to see whether the sequence changes are shared by more than one 
5 human population. Gene sequences from different human populations can be obtained 
from databases made available by, for example, the Human Genome Project, the human 
genome diversity project or, alternatively, from direct sequencing of PCR-amplified DNA 
from a number of unrelated, diverse human populations. The presence of the identified 
changes in different human populations would further indicate the evolutionary significance 

10 of the changes. Chimpanzee sequences with significant changes can be obtained and 
evaluated using similar methods to determine whether the sequence changes are shared 
among many chimpanzees. 

Sequences with significant changes between species can be further characterized in 
terms of their molecular/genetic identities and biological functions, using methods and 

15 techniques known to those of ordinary skill in the art. For example, the sequences can be 
located genetically and physically within the human genome using publicly available bio- 
informatics programs. The newly identified significant changes within the nucleotide 
sequence may suggest a potential role of the gene in human evolution and a potential 
association with human-unique functional capabilities. The putative gene with the 

20 identified sequences may be further characterized by, for example, homologue searching. 

Shared homology of the putative gene with a known gene may indicate a similar biological 
role or function. Another exemplary method of characterizing a putative gene sequence is 
on the basis of known sequence motifs. Certain sequence patterns are known to code for 
regions of proteins having specific biological characteristics such as signal sequences, DNA 

25 binding domains, or transmembrane domains. 

The identified human sequences with significant changes can also be further 
evaluated by looking at where the gene is expressed in terms of tissue- or cell type- 
specificity. For example, the identified coding sequences can be used as probes to perform 
in situ mRNA hybridization that will reveal the expression patterns of the sequences. 

30 Genes that are expressed in certain tissues may be better candidates as being associated 

with important human functions associated with that tissue, for example brain tissue. The 
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sequences. Such sequences may include, but are not limited to, non-coding sequences or 
coding sequences that do not encode human proteins. These related sequences can be, for 
example, physically adjacent to the human protein-coding sequences in the human genome, 
such as introns or 5'- and 3'- flanking sequences (including control elements such as 
promoters and enhancers). These related sequences may be obtained via searching a public 
human genome database such as GenBank or, alternatively, by screening and sequencing a 
human genomic library with a protein-coding sequence as probe. Methods and techniques 
for obtaining non-coding sequences using related coding sequence are well known for one 
skilled in the art. 

The evolutionarily significant nucleotide changes, which are detected by molecular 
evolution analysis such as the K A /K S analysis, can be further assessed for their unique 
occurrence in humans (or the non-human primate) or the extent to which these changes are 
unique in humans (or the non-human primate). For example, the identified changes can be 
tested for presence/absence in other non-human primate sequences. The sequences with at 
least one evolutionarily significant change between human and one non-human primate can 
be used as primers for PCR analysis of other non-human primate protein-coding sequences, 
and resulting polynucleotides are sequenced to see whether the same change is present in 
other non-human primates. These comparisons allow further discrimination as to whether 
the adaptive evolutionary changes are unique to the human lineage as compared to other 
non-human primates or whether the adaptive change is unique to the non-human primates 
(i.e., chimpanzee) as compared to humans and other non-human primates. A nucleotide 
change that is detected in human but not other primates more likely represents a human 
adaptive evolutionary change. Alternatively, a nucleotide change that is detected in a non- 
human primate (i.e., chimpanzee) that is not detected in humans or other non-human 
primates likely represents a chimpanzee adaptive evolutionary change. Other non-human 
primates used for comparison can be selected based on their phylogenetic relationships with 
human. Closely related primates can be those within the hominoid sublineage, such as 
chimpanzee, bonobo, gorilla, and orangutan. Non-human primates can also be those that 
are outside the hominoid group and thus not so closely related to human, such as the Old 
World monkeys and New World monkeys. Statistical significance of such comparisons 
may be determined using established available programs, e.g., /-test as used by Messier and 
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about 1 .0, more preferably at least about 1.25, more preferably at least about 1.50, or more 
preferably at least about 2.00. Preferably, statistical analysis is performed on all elevated 
K A /K S ratios, including, but not limited to, standard methods such as Student's Mest and 
likelihood ratio tests described by Yang (1998) Mol Biol Evol 37:441-456. 
5 K A /K S ratios significantly greater than unity strongly suggest that positive selection 

has fixed greater numbers of amino acid replacements than can be expected as a result of 
chance alone, and is in contrast to the commonly observed pattern in which the ratio is less 
than or equal io one. Nei (1987); Hughes and Kei (i 988) Nature 335:167-170; Messier and 
Stewart (1994) Current Biol 4:91 1-913; Kreitman and Akashi (1995) Ann Rev. Ecol SysL 

10 26:403-422; Messier and Stewart (1997). Ratios less than one generally signify the role of 
negative, or purifying selection: there is strong pressure on the primary structure of 
functional, effective proteins to remain unchanged. 

All methods for calculating K A /K S ratios are based on a pairwise comparison of the 
number of nonsynonymous substitutions per nonsynonymous site to the number of 

1 5 synonymous substitutions per synonymous site for the protein-coding regions of 

homologous genes from related species. Each method implements different corrections for 
estimating "multiple hits" {i.e., more than one nucleotide substitution at the same site). 
Each method also uses different models for how DNA sequences change over evolutionary 
time. Thus, preferably, a combination of results from different algorithms is used to 

20 increase the level of sensitivity for detection of positively-selected genes and confidence in 
the result. 

Preferably, K A /K S ratios should be calculated for orthologous gene pairs, as opposed 
to paralogous gene pairs {i.e., a gene which results from speciation, as opposed to a gene 
that is the result of gene duplication) Messier and Stewart (1997). This distinction may be 

25 made by performing additional comparisons with other non-human primates, such as 
gorilla and orangutan, which allows for phylogenetic tree-building. Orthologous genes 
when used in tree-building will yield the known "species tree", i.e., will produce a tree that 
recovers the known biological tree. In contrast, paralogous genes will yield trees which 
will violate the known biological tree. 

30 It is understood that the methods described herein could lead to the identification of 

human polynucleotide sequences that are functionally related to human protein-coding 
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Evol 36:96-99; Messier and Stewart (1997) Nature 385:15 1-154; Nei (1987) Molecular 
Evolutionary Genetics (New York, Columbia University Press). The K A /K S method, which 
comprises a comparison of the rate of non-synonymous substitutions per non-synonymous 
site with the rate of synonymous substitutions per synonymous site between homologous 
protein-coding region of genes in terms of a ratio, is used to identify sequence substitutions 
that may be driven by adaptive selections as opposed to neutral selections during evolution. 
A synonymous ("silent") substitution is one that, owing to the degeneracy of the genetic 
code, makes no change to the amino acid sequence encoded; a non-synonymous 
substitution results in an amino acid replacement. The extent of each type of change can be 
estimated as K A and K s , respectively, the numbers of synonymous substitutions per 
synonymous site and non-synonymous substitutions per non-synonymous site. 
Calculations of K A /K S may be performed manually or by using software. An example of a 
suitable program is MEGA (Molecular Genetics Institute, Pennsylvania State University). 

For the purpose of estimating K A and K s , either complete or partial human protein- 
coding sequences are used to calculate total numbers of synonymous and non-synonymous 
substitutions, as well as non-synonymous and synonymous sites. The length of the 
polynucleotide sequence analyzed can be any appropriate length. Preferably, the entire 
coding sequence is compared, in order to determine any and all significant changes. 
Publicly available computer programs, such as Li93 (Li (1 993) J, Mol Evol 36:96-99) or 
IN A, can be used to calculate the K A and K s values for all pairwise comparisons. This 
analysis can be further adapted to examine sequences in a "sliding window" fashion such 
that small numbers of important changes are not masked by the whole sequence. "Sliding 
window" refers to examination of consecutive, overlapping subsections of the gene (the 
subsections can be of any length). 

The comparison of non-synonymous and synonymous substitution rates is 
represented by the K A /K s ratio. K A /K s has been shown to be a reflection of the degree to 
which adaptive evolution has been at work in the sequence under study. Full length or 
partial segments of a coding sequence can be used for the K A /K s analysis. The higher the 
K A /K s ratio, the more likely that a sequence has undergone adaptive evolution and the non- 
synonymous substitutions are evolutionarily significant. See, for example, Messier and 
Stewart (1997). Preferably, the K A /K S ratio is at least about 0.75, more preferably at least 
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points where there are changes between the non-human primate and human sequences; (b) 
checking the sequence fluorogram (chromatogram) to determine if the bases that appear 
unique to non-human primate correspond to strong, clear signals specific for the called 
base; (c) checking the human hits to see if there is more than one human sequence that 
5 corresponds to a sequence change. Multiple human sequence entries for the same gene that 
have the same nucleotide at a position where there is a different nucleotide in a non-human 
primate sequence provides independent support that the human sequence is accurate, and 

mat lilt wuoii^i* io oigum^cuii. uuui wnaii^wo aiw C/waiiiiii^u uamg puun^ viciiciucidv 

information and the genetic code to determine whether these nucleotide sequence changes 
10 result in a change in the amino acid sequence of the encoded protein. As the definition of 
"nucleotide change" makes clear, the present invention encompasses at least one nucleotide 
change, either a substitution, a deletion or an insertion, in a human protein-coding 
polynucleotide sequence as compared to corresponding sequence from a non-human 
primate. Preferably, the change is a nucleotide substitution. More preferably, more than 
15 one substitution is present in the identified human sequence and is subjected to molecular 
evolution analysis. 

Any of several different molecular evolution analyses or K A /K s -type methods can be 
employed to evaluate quantitatively and qualitatively the evolutionary significance of the 
identified nucleotide changes between human gene sequences and that of a non-human 

20 primate. Kreitman and Akashi (1995) Annu. Rev. Ecol Syst. 26:403-422; Li, Molecular 
Evolution, Sinauer Associates, Sunderland, MA, 1997. For example, positive selection on 
proteins (i.e., molecular-level adaptive evolution) can be detected in protein-coding genes 
by pairwise comparisons of the ratios of nonsynonymous nucleotide substitutions per 
nonsynonymous site (K A ) to synonymous substitutions per synonymous site (K s ) (Li et a/., 

25 1985; Li, 1993). Any comparison of K A and K s may be used, although it is particularly 
convenient and most effective to compare these two variables as a ratio. Sequences are 
identified by exhibiting a statistically significant difference between K A and K s using 
standard statistical methods. 

Preferably, theK A /K s analysis by Li et ai is used to carry out the present invention, 

30 although other analysis programs that can detect positively selected genes between species 
can also be used. Li et ai (1985) Mol Biol Evol 2:150-174; Li (1993); see also J. Mol 
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bonobo, gorilla, orangutan, gibbon, Old World monkeys, and New World monkeys. A 
phylogenetic tree for primates within the hominoid group is depicted in FIG. 1 . 
Bioinformatics is applied to the comparison and sequences are selected that contain a 
nucleotide change or changes that is/are evolutionarily significant change(s). The invention 
5 enables the identification of genes that have evolved to confer some evolutionary advantage 
and the identification of the specific evolved changes. 

Protein-coding sequences of human and another non-human primate are compared 
to identify homologous sequences. Any appropriate mechanism for completing this 
comparison is contemplated by this invention. Alignment may be performed manually or 

10 by software (examples of suitable alignment programs are known in the art). Preferably, 

protein-coding sequences from a non-human primate are compared to human sequences via 
database searches, e.g., BLAST searches. The high scoring "hits," i.e., sequences that show 
a significant similarity after BLAST analysis, will be retrieved and analyzed. Sequences 
showing a significant similarity can be those having at least about 60%, at least about 75%, 

15 at least about 80%, at least about 85%, or at least about 90% sequence identity. Preferably, 
sequences showing greater than about 80% identity are further analyzed. The homologous 
sequences identified via database searching can be aligned in their entirety using sequence 
alignment methods and programs that are known and available in the art, such as the 
commonly used simple alignment program CLUSTAL V by Higgins et al (1992) CABIOS 

20 8:189-191. 

Alternatively, the sequencing and homologous comparison of protein-coding 
sequences between human and a non-human primate may be performed simultaneously by 
using the newly developed sequencing chip technology. See, for example, Rava et al US 
Patent 5,545,531. 

25 The aligned protein-coding sequences of human and another non-human primate are 

analyzed to identify nucleotide sequence differences at particular sites. Again, any suitable 
method for achieving this analysis is contemplated by this invention. If there are no 
nucleotide sequence differences, the non-human primate protein coding sequence is not 
usually further analyzed. The detected sequence changes are generally, and preferably, 

30 initially checked for accuracy. Preferably, the initial checking comprises performing one or 
more of the following steps, any and all of which are known in the art: (a) finding the 
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sequences are, for example, those that are only found in a specific tissue, such as brain, or 
that correspond to genes likely to be important in the specific function, such as brain 
function. Such human tissue-specific cDNA sequences can be obtained by searching 
online human sequence databases such as GenBank, in which information with respect to 
the expression profile and/or biological activity for cDNA sequences are specified. 

Sequences of non-human primate (for example, from an AIDS-resistant non-human 
primate) homologue(s) to a known human gene may be obtained using methods standard 
in the art, such as from public databases such as GenBank or FCR methods (using, for 
example, GeneAmp PCR System 9700 thermocyclers (Applied Biosystems, Inc.)). For 
example non-human primate cDNA candidates for sequencing can be selected by PCR 
using primers designed from candidate human cDNA sequences. For PCR, primers may be 
made from the human sequences using standard methods in the art, including publicly 
available primer design programs such as PRIMER® (Whitehead Institute). The sequence 
amplified may then be sequenced using standard methods and equipment in the art, such as 
automated sequencers (Applied Biosystems, Inc.). 

GENERAL METHODS OF THE INVENTION 

The general method of the invention is as follows. Briefly, nucleotide sequences 
are obtained from a human source and a non-human source. The human and non-human 
nucleotide sequences are compared to one another to identify sequences that are 
homologous. The homologous sequences are analyzed to identify those that have nucleic 
acid sequence differences between the two species. Then molecular evolution analysis is 
conducted to evaluate quantitatively and qualitatively the evolutionary significance of the 
differences. For genes that have been positively selected between two species, e.g., human 
and chimp, it is useful to determine whether the difference occurs in other non-human 
primates. Next, the sequence is characterized in terms of molecular/genetic identity and 
biological function. Finally, the information can be used to identify agents useful in 
diagnosis and treatment of human medically or commercially relevant conditions. 

The general methods of the invention entail comparing human protein-coding 
nucleotide sequences to protein-coding nucleotide sequences of a non-human, preferably a 
primate, and most preferably a chimpanzee. Examples of other non-human primates are 
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expressed in a primate cell using standard techniques in the art. In some embodiments, the 
cDNA is prepared from mRNA obtained from a tissue at a determined developmental stage, 
or a tissue obtained after the primate has been subjected to certain environmental 
conditions. cDNA libraries used for the sequence comparison of the present invention can 
5 be constructed using conventional cDNA library construction techniques that are explained 
fully in the literature of the art. Total mRNAs are used as templates to reverse-transcribe 
cDNAs. Transcribed cDNAs are subcloned into appropriate vectors to establish a cDNA 
library. The established cDNA library can be maximized for full-length cDNA contents, 
although less than full-length cDNAs may be used. Furthermore, the sequence frequency 
10 can be normalized according to, for example, Bonaldo et ah (1 996) Genome Research 
6:791-806. cDNA clones randomly selected from the constructed cDNA library can be 
sequenced using standard automated sequencing techniques. Preferably, full-length cDNA 
clones are used for sequencing. Either the entire or a large portion of cDNA clones from a 
cDNA library may be sequenced, although it is also possible to practice some embodiments 
1 5 of the invention by sequencing as little as a single cDNA, or several cDNA clones. 

In one preferred embodiment of the present invention, non-human primate cDNA 
clones to be sequenced can be pre-selected according to their expression specificity. In 
order to select cDNAs corresponding to active genes that are specifically expressed, the 
cDNAs can be subject to subtraction hybridization using mRNAs obtained from other 
20 organs, tissues or cells of the same animal. Under certain hybridization conditions with 
appropriate stringency and concentration, those cDNAs that hybridize with non-tissue 
specific mRNAs and thus likely represent "housekeeping" genes will be excluded from the 
cDNA pool. Accordingly, remaining cDNAs to be sequenced are more likely to be 
associated with tissue-specific functions. For the purpose of subtraction hybridization, non- 
25 tissue-specific mRNAs can be obtained from one organ, or preferably from a combination 
of different organs and cells. The amount of non-tissue-specific mRNAs are maximized to 
saturate the tissue-specific cDNAs. 

Alternatively, information from online public databases can be used to select or give 
priority to cDNAs that are more likely to be associated with specific functions. For 
30 example, the non-human primate cDNA candidates for sequencing can be selected by PCR 
using primers designed from candidate human cDNA sequence. Candidate human cDNA 
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at least one index of HIV susceptibility is affected, using a cell-based system as described 
herein, as compared with an appropriate control Indicia of HIV susceptibility include, but 
are not limited to, cell-to-cell transmission of the virus, as measured by total number of 
cells infected with HIV and syncytia formation. 

5 The term "target site" means a location in a polypeptide which can beg a single 

amino acid and/or is a part of, a structural and/or functional motif, e.g., a binding site, a 
dimerization domain, or a catalytic active site. Target sites may be a useful for direct or 
indirect interaction with an agent, such as a therapeutic agent. 

The term "molecular difference" includes any structural and/or functional 

10 difference. Methods to detect such differences, as well as examples of such differences, are 
described herein. 

A "functional effect" is a term well known in the art, and means any effect which is 
exhibited on any level of activity, whether direct or indirect. 



General Procedures Known in the Art 

For the purposes of this invention, the source of the human and non-human 
polynucleotide can be any suitable source, e.g., genomic sequences or cDNA sequences. 
Preferably, cDNA sequences from human and a non-human primate are compared. Human 

20 protein-coding sequences can be obtained from public databases such as the Genome 

Sequence Data Bank and GenBank. These databases serve as repositories of the molecular 
sequence data generated by ongoing research efforts. Alternatively, human protein-coding 
sequences may be obtained from, for example, sequencing of cDNA reverse transcribed 
from mRNA expressed in human cells, or after PCR amplification, according to methods 

25 well known in the art. Alternatively, human genomic sequences may be used for sequence 
comparison. Human genomic sequences can be obtained from public databases or from a 
sequencing of commercially available human genomic DNA libraries or from genomic 
DN A, after PCR. 

The non-human primate protein-coding sequences can be obtained by, for example, 
30 sequencing cDNA clones that are randomly selected from a non-human primate cDNA 
library. The non-human primate cDNA library can be constructed from total mRNA 
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animal extracts, and the like. Compounds can be tested singly or in combination with one 
another. 

The term "to modulate function" of a polynucleotide or a polypeptide means that 
the function of the polynucleotide or polypeptide is altered when compared to not adding an 
agent. Modulation may occur on any level that affects function. A polynucleotide or 
polypeptide function may be direct or indirect, and measured directly or indirectly. 

A "function of a polynucleotide" includes, but is not limited to, replication; 
translation; expression pattern(s). A polynucleotide function also includes functions 
associated with a polypeptide encoded within the polynucleotide. For example, an agent 
which acts on a polynucleotide and affects protein expression, conformation, folding (or 
other physical characteristics), binding to other moieties (such as ligands), activity (or other 
functional characteristics), regulation and/or other aspects of protein structure or function is 
considered to have modulated polynucleotide function. 

A "function of a polypeptide" includes, but is not limited to, conformation, folding 
(or other physical characteristics), binding to other moieties (such as ligands), activity (or 
other functional characteristics), and/or other aspects of protein structure or functions. For 
example, an agent that acts on a polypeptide and affects its conformation, folding (or other 
physical characteristics), binding to other moieties (such as ligands), activity (or other 
functional characteristics), and/or other aspects of protein structure or functions is 
considered to have modulated polypeptide function. The ways that an effective agent can 
act to modulate the function of a polypeptide include, butare not limited to 1) changing the 
conformation, folding or other physical characteristics; 2) changing the binding strength to 
its natural ligand or changing the specificity of binding to ligands; and 3) altering the 
activity of the polypeptide. 

The terms "modulate susceptibility to development of AIDS" and "modulate 
resistance to development of AIDS", as used herein, include modulating intra-organism 
cell-to-cell transmission or infectivity of HIV. The terms further include reducing 
susceptibility to development of AIDS and/or cell-to-cell transmission or infectivity of 
HIV. The terms further include increasing resistance to development of AIDS and/or cell- 
to-cell transmission or infectivity of HIV. One means of assessing whether an agent is one 
that modulates susceptibility or resistance to development of AIDS is to determine whether 
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"AIDS resistant" means that an organism, such as a chimpanzee, exhibits an ability 
to avoid, or diminish the extent of, the result of HIV infection (such as propagation and 
dissemination) and/or development of AIDS, preferably when compared to AIDS- 
susceptible humans. 

5 "Susceptibility" to AIDS means that an organism, such as a human, fails to avoid, 

or diminish the extent of, the result of HIV infection (such as propagation and 
dissemination) and/or development of AIDS, preferably when compared to an organism 

<Unt * +rt U/t. A TT\C mdi^nm^ mml* « „ « ~ - 1 1 * ~ ~U ~U: . 

The term "brain protein-coding nucleotide sequence" as used herein refers to a 
10 nucleotide sequence expressed in the brain that encodes a protein. One example of the 
"brain protein-coding nucleotide sequence" is a brain cDNA sequence. 

As used herein, the term "brain functions unique or enhanced in humans" or 
"unique functional capabilities of the human brain" or "brain functional capability that is 
unique or enhanced in humans" refers to any brain function, either in kind or in degree, that 
1 5 is identified and/or observed to be enhanced in humans compared to other non-human 
primates. Such brain functions include, but are not limited to high capacity information 
processing, storage and retrieval capabilities, creativity, memory, language abilities, brain- 
mediated emotional response, locomotion, pain/pleasure sensation, olfaction, and 
temperament. 

20 "Housekeeping genes" is a term well understood in the art and means those genes 

associated with general cell function, including but not limited to growth, division, stasis, 
metabolism, and/or death. "Housekeeping" genes generally perform functions found in 
more than one cell type. In contrast, cell-specific genes generally perform functions in a 
particular cell type (such as neurons) and/or class (such as neural cells). 

25 The term "agent", as used herein, means a biological or chemical compound such as 

a simple or complex organic or inorganic molecule, a peptide, a protein or an 
oligonucleotide. A vast array of compounds can be synthesized, for example oligomers, 
such as oligopeptides and oligonucleotides, and synthetic organic and inorganic compounds 
based on various core structures, and these are also included in the term "agent". In 

30 addition, various natural sources can provide compounds for screening, such as plant or 
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The term "homologous" or "homologue" or "ortholog" is known and well 
understood in the art and refers to related sequences that share a common ancestor and is 
determined based on degree of sequence identity. These terms describe the relationship 
between a gene found in one species and the corresponding or equivalent gene in another 
5 species. For purposes of this invention homologous sequences are compared. 

"Homologous sequences" or "homologues" or "orthologs" are thought, believed, or known 
to be functionally related. A functional relationship may be indicated in any one of a 
number of ways, including, but not limited to, (a) degree of sequence identity; (b) same or 
similar biological function. Preferably, both (a) and (b) are indicated. The degree of 

10 sequence identity may vary, but is preferably at least 50% (when using standard sequence 
alignment programs known in the art), more preferably at least 60%, more preferably at 
least about 75%, more preferably at least about 85%. Homology can be determined using 
software programs readily available in the art, such as those discussed in Current Protocols 
in Molecular Biology (F.M. Ausubel et aL, eds., 1987) Supplement 30, section 7.718, Table 

15 7.71 . Preferred alignment programs are MacVector (Oxford Molecular Ltd, Oxford, U.K.) 
and ALIGN Plus (Scientific and Educational Software, Pennsylvania). Another preferred 
alignment program is Sequencher (Gene Codes, Ann Arbor, Michigan), using default 
parameters. 

The term "nucleotide change" refers to nucleotide substitution, deletion, and/or 

20 insertion, as is well understood in the art. 

The term "human protein-coding nucleotide sequence" which is "associated with 
susceptibility to AIDS" as used herein refers to a human nucleotide sequence that encodes a 
protein that is associated with HIV dissemination (within the organism, i.e., intra-organism 
infectivity), propagation and/or development of AIDS. Due to the extensive research in the 

25 mechanisms underlying progression from HIV infection to the development of AIDS, a 

number of candidate human genes are believed or known to be associated with one or more 
of these phenomena. A polynucleotide (including any polypeptide encoded therein) 
sequence associated with susceptibility to AIDS is one which is either known or implicated 
to play a role in HIV dissemination, replication, and/or subsequent progression to full- 

30 blown AIDS. Examples of such candidate genes are provided below. 

20 

BNSDOCID: <WO 9939006A2J_> 



WO 99/39006 



PCT/US99/01964 



are designated using several systems of nomenclature, including but not limited to K A /K S , 
d N /d s , D N /D S . 

The terms "evolutionarily significant change" or "adaptive evolutionary change" 
refers to one or more nucleotide or peptide sequence change(s) between two species that 
5 may be attributed to a positive selective pressure. One method for determining the 

presence of an evolutionarily significant change is to apply a K A /K s -type analytical method, 
such as to measure a K A /K S ratio. Typically, a K A /K S ratio at least about 0.75, more 
preferably at least about 1 .0, more preferably at least about 1 .25, more preferably at least 
about 1 .5 and most preferably at least about 2.0 indicates the action of positive selection 
10 and is considered to be an evolutionarily significant change. 

The term "positive evolutionarily significant change" means an evolutionarily 
significant change in a particular species that results in an adaptive change that is positive 
as compared to other related species. Examples of positive evolutionarily significant 
changes are changes that have resulted in enhanced cognitive abilities in humans and 
15 adaptive changes in chimpanzees that have resulted in the ability of the chimpanzees 
infected with HIV to be resistant to progression to full-blown AIDS. 

The term "resistant" means that an organism, such as a chimpanzee, exhibits an 
ability to avoid, or diminish the extent of, a disease condition and/or development of the 
disease, preferably when compared to non-resistant organisms, typically humans. For 
20 example, a chimpanzee is resistant to certain impacts of HIV and other viral infections, 
and/or it does not develop the ultimate disease - AIDS. 

The term "susceptibility" means that an organism, such as a human, fails to avoid, 
or diminish the extent of, a disease condition and/or development of the disease condition, 
preferably when compared to an organism that is known to be resistant, such as a non- 
25 human primate, such as chimpanzee. For example, a human is susceptible to certain 

impacts of HIV and other viral infections and/or development of the ultimate disease - 
AIDS. 

It is understood that resistance and susceptibility vary from individual to individual, 
and that, for purposes of this invention, these terms also apply to a group of individuals 
30 within a species, and comparisons of resistance and susceptibility generally refer to overall, 
average differences between species, although intra-specific comparisons may be used. 
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are post-translationally modified through reactions that include glycosylation, acetylation 
and phosphorylation. 

A "physiological condition 1 ' is a term well-understood in the art and means any 
condition or state that be measured and/or observed. A "physiological condition" includes, 
but is not limited to, a physical condition, such as degree of body fat, alopecia (baldness), 
acne; life-expectancy; disease states (which include susceptibility and/or resistance to 
diseases), such as cancer or infectious diseases. Examples of physiological conditions are 
provided below (see, e.g., definitions of "human medically relevant medical condition", 
"human commercially relevant condition", "medically relevant evolved trait", and 
"commercially relevant evolved trait") and throughout the specification, and it is 
understood that these terms and examples refer to a physiological condition. A 
physiological condition may be, but is not necessarily, the result of multiple factors, any of 
which in turn may be considered a physiological condition. A physiological condition 
which is "present" in a human or non-human primate occurs within a given population, and 
includes those physiological conditions which are unique and/or enhanced in a given 
population when compared to another population. 

The terms "human medically relevant condition" or "human commercially relevant 
condition" are used herein to refer to human conditions for which medical or non-medical 
(respectively) intervention is desired. 

The term "medically relevant evolved trait" is used herein to refer to traits that have 
evolved in humans or non-human primates whose analysis could provide information (e.g., 
physical or biochemical data) relevant to the development of a human medical treatment. 

The term "commercially relevant evolved trait" is used herein to refer to traits that 
have evolved in humans or non-human primates whose analysis could provide information 
(e.g., physical or biochemical data) relevant to the development of a non-medical product 
or treatment for human use. 

The term "K A /K s -type methods" means methods that evaluate differences, 
frequently (but not always) shown as a ratio, between the number of nonsynonymous 
substitutions and synonymous substitutions in homologous genes (including the more 
rigorous methods that determine non-synonymous and synonymous sites). These methods 
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AIDS. Commercially relevant examples include, but are not limited to, polynucleotides 
and polypeptides that are positively selected in non-human primates that may be associated 
with aesthetic traits, such as hair growth, acne or muscle mass. 

Positively selected human evolutionarily significant changes in polynucleotide and 
5 polypeptide sequences may be attributed to human capabilities that provide humans with 
competitive advantages, particularly when compared to the closest evolutionary relative, 
chimpanzee, such as unique or enhanced human brain functions. The present invention 
identifies human genes that evolved to provide unique or enhanced human cognitive 
abilities and the actual protein changes that confer functional differences will be quite 
10 useful in therapeutic approaches to treat cognitive deficiencies as well as cognitive 
enhancement for the general population. 

The practice of the present invention employs, unless otherwise indicated, 
conventional techniques of molecular biology, genetics and molecular evolution, which are 
within the skill of the art. Such techniques are explained fully in the literature, such as: 
15 "Molecular Cloning: A Laboratory Manual", second edition (Sambrook et aL, 1989); 
"Oligonucleotide Synthesis" (M.J. Gait, ed., 1984); "Current Protocols in Molecular 
Biology" (F.M. Ausubel et al, eds., 1987); "PCR: The Polymerase Chain Reaction", 
(Mullis et a/., eds., 1994); "Molecular Evolution", (Li, 1997). 

Definitions 

20 As used herein, a "polynucleotide" refers to a polymeric form of nucleotides of any 

length, either ribonucleotides or deoxyribonucleotides, or analogs thereof. This term refers 
to the primary structure of the molecule, and thus includes double- and single-stranded 
DNA, as well as double- and single-stranded RNA. It also includes modified 
polynucleotides such as methylated and/or capped polynucleotides. The terms 

25 "polynucleotide" and ^nucleotide sequence" are used interchangeably. 

As used herein, a "gene" refers to a polynucleotide or portion of a polynucleotide 
comprising a sequence that encodes a protein. It is well understood in the art that a gene 
also comprises non-coding sequences, such as 5' and 3' flanking sequences (such as 
promoters, enhancers, repressors, and other regulatory sequences) as well as introns. 

30 The terms "polypeptide," "peptide," and "protein" are used interchangeably herein 

to refer to polymers of amino acids of any length. These terms also include proteins that 
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humans, particularly primates, and most preferably chimpanzees, including the two known 
species, common chimpanzees and bonobos (pygmy chimpanzees). For example, 
chimpanzees and humans are 98.5% identical at the DNA sequence level and the present 
invention can identify the adaptive molecular changes underlying differences between the 
5 species in a number of areas, including unique or enhanced human cognitive abilities and 
chimpanzee resistance to AIDS and certain cancers. Unlike traditional genomics, which 
merely identifies genes, the present invention provides exact information on evolutionary 
solutions that eliminate disease or provide unique functions. The present invention 
identifies genes that have evolved to confer an evolutionary advantage and the specific 
10 evolved changes. 

The present invention results from the observation that human protein-coding 
polynucleotides may contain sequence changes that are found in humans but not in other 
evolutionarily closely related species such as non-human primates, as a result of adaptive 
selection during evolution 

15 The present invention further results from the observation that the genetic 

information of non-human primates may contain changes that are found in a particular non- 
human primate but not in humans, as a result of adaptive selection during evolution. In this 
embodiment, a non-human primate polynucleotide or polypeptide has undergone natural 
selection that resulted in a positive evolutionarily significant change (i.e., the non-human 

20 primate polynucleotide or polypeptide has a positive attribute not present in humans). In 
this embodiment the positively selected polynucleotide or polypeptide may be associated 
with susceptibility or resistance to certain diseases or other commercially relevant traits. 
Medically relevant examples of this embodiment include, but are not limited to, 
polynucleotides and polypeptides that are positively selected in non-human primates, 

25 preferably chimpanzees, that may be associated with susceptibility or resistance to 

infectious diseases and cancer. An example of this embodiment includes polynucleotides 
and polypeptides associated with the susceptibility or resistance to progression from HIV 
infection to development of AIDS. The present invention can thus be useful in gaining 
insight into the molecular mechanisms that underlie resistance to progression from HIV 

30 infection to development of AIDS, providing information that can also be useful in 

discovering and/or designing agents such as drugs that prevent and/or delay development of 
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BRIEF DESCRIPTION OF THE DRAWINGS 
Figure 1 depicts a phylogenetic tree for primates within the hominoid group. The 
branching orders are based on well-supported mitochondrial DNA phylogenies. Messier 
5 and Stewart (1997) Nature 385:151-154. 

Figure 2 (SEQ ID NOS:l-3) is a nucleotide sequence alignment between human and 
chimpanzee ICAM-1 sequences (GenBank accession numbers X06990 and X86848, 
itapccuvciy;. aiuiusj awiu uausiciuun ui uie cnimpanzee sequence is snown oeiow me 
alignment. 

10 Figure 3 shows the nucleotide sequence of gorilla ICAM-1 (SEQ ID NO:4). 

Figure 4 shows the nucleotide sequence of orangutan ICAM-1 (SEQ ID NO:5). 
Figures 5(A)-(E) show the polypeptide sequence alignment of ICAM-1 from several 
primate species. 

Figures 6(A)-(B) show the polypeptide sequence alignment of ICAM-2 from several 
15 primate species. 

Figures7(A)-(P) show the polypeptide sequence alignment of ICAM-3 from several 
primate species. 

Figure 8 depicts a schematic representation of a procedure for comparing 
human/primate brain polynucleotides, selecting sequences with evolutionarily significant 
20 changes, and further characterizing the selected sequences. The diagram of Figure 8 

illustrates a preferred embodiment of the invention and together with the description serves 
to explain the principles of the invention, along with elaboration and optional additional 
steps. It is understood that any human/primate polynucleotide sequence can be compared 
by a similar procedure and that the procedure is not limited to brain polynucleotides. 

25 

DETAILED DESCRIPTION OF THE INVENTION 
The present invention applies comparative genomics to identify specific gene 
changes which are associated with, and thus may contribute to or be responsible for, 
physiological conditions, such as medically or commercially relevant evolved traits. The 
30 invention comprises a comparative genomics approach to identify specific gene changes 

responsible for differences in functions and diseases distinguishing humans from other non- 
15 
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methods described herein, with a corresponding human polypeptide (or a portion of the 
polypeptide), wherein a location of a molecular difference, such as an amino acid 
difference, if any, indicates a target site. 

Biochemical methods are meant to include methods that are used to examine 
functional differences, such as binding specificity, binding strength, or optimal binding 
conditions, for a protein encoded by a gene that has undergone adaptive evolution. Side- 
by-side comparison of biochemical characteristics of a protein (either human or non-human 
primate) and the evolved homologous protein (either non-human primate or human, 
respectively) will reveal valuable information for developing treatments for related human 
conditions and diseases. 

In another aspect, the invention provides methods of identifying an agent which 
may modulate a physiological condition, said method comprising contacting an agent (i.e., 
at least one agent to be tested) with a cell that has been transfected with a polynucleotide 
sequence identified by any of the methods described herein, wherein an agent is identified 
by its ability to modulate function of the polynucleotide sequence. In other embodiments, 
the invention provides methods of identifying an agent which may modulate a 
physiological condition, said method comprising contacting an agent (i.e., at least one 
agent) to be tested with a polypeptide (or a fragment of a polypeptide and/or a composition 
comprising a polypeptide or fragment of a polypeptide) encoded in or within a 
polynucleotide identified by any of the methods described herein, wherein an agent is 
identified by its ability to modulate function of the polypeptide. The invention also 
provides agents which are identified using the screening methods described herein. 

In another aspect, the invention provides methods of screening agents which may 
modulate the activity of the human polynucleotide or polypeptide to either modulate a 
unique or enhanced human function or to mimic the non-human primate trait of interest, 
such as susceptibility or resistance to development of a disease, such as AIDS. These 
methods comprise contacting a cell which has been transfected with a polynucleotide 
sequence with an agent to be tested, and identifying agents based on their ability to 
modulate function of the polynucleotide or contacting a polypeptide preparation with an 
agent to be tested and identifying agents based upon their ability to modulate function of 
the polypeptide. 
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functional effect), if any, of (the presence or absence of) a polynucleotide sequence 
identified by any of the methods described herein, wherein presence of a functional effect 
indicates a correlation between the evolutionary significant nucleotide change and the 
physiological condition. Alternatively, in these methods, a functional effect (if any) may be 
5 assessed using a polypeptide sequence (or a portion of the polypeptide sequence) encoded 
by a nucleotide sequence identified by any of the methods described herein. 

The present invention also provides comparison of the identified polypeptides by 

_i : 1 a u:~~u : j„ i ^i. _ * . * . _ • _ , i , , i 
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biochemical consequences of the evolutionarily significant changes. Physical methods are 
JO meant to include methods that are used to examine structural changes to proteins encoded 
by genes found to have undergone adaptive evolution. Side-by-side comparison of the 
three-dimensional structures of a protein (either human or non-human primate) and the 
evolved homologous protein (either non-human primate or human, respectively) will 
provide valuable information for developing treatments for related human conditions and 

15 diseases. For example, using the methods of the present invention, the chimpanzee ICAM- 
1 gene was identified as having positive evolutionary changes compared to human ICAM- 
1 . In a three-dimensional model of two functional domains of the human ICAM-1 protein 
it can be seen that five of the six amino acids that have been changed in chimpanzees are 
immediately adjacent to (i.e., physically touching) amino acid residues known to be crucial 

20 for binding to the ICAM-1 counter-receptor, LFA-1 ; in each case, the human amino acid 
has been replaced by a larger amino acid in the chimpanzee IC AM- 1 . Such information 
allows insight into designing appropriate therapeutic intervention(s). 

Accordingly, in another aspect, the invention provides methods for identifying a 
target site (which includes one or more target sites) which may be suitable for therapeutic 

25 intervention, comprising comparing a human polypeptide (or a portion of the polypeptide) 
encoded in a sequence identified by any of the methods described herein, with a 
corresponding non-human polypeptide (or a portion of the polypeptide), wherein a location 
of a molecular difference, if any, indicates a target site. In another aspect, the invention 
provides methods for identifying a target site (which includes one or more target sites) 

30 which may be suitable for therapeutic intervention, comprising comparing a non-human 

polypeptide (or a portion of the polypeptide) encoded in a sequence identified by any of the 

13 
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comparing human protein-coding nucleotide sequences to protein-coding nucleotide 
sequences of a non-human primate; and (b) selecting a human polynucleotide sequence that 
contains at least one (i.e., one or more) nucleotide change as compared to corresponding 
sequence of the non-human primate, wherein said change is evolutionarily significant. The 
sequences identified by this method may be further characterized and/or analyzed for their 
possible association with biologically or medically relevant functions unique or enhanced 
in humans. 

Another embodiment of the present invention is a method for large scale sequence 
comparison between human protein-coding polynucleotide sequences and the protein- 
coding polynucleotide sequences from a non-human primate, e.g., chimpanzee, comprising: 
(a) aligning the human polynucleotide sequences with corresponding polynucleotide 
sequences from non-human primate according to sequence homology; and (b) identifying 
any nucleotide changes within the human sequences as compared to the homologous 
sequences from the non-human primate, wherein the changes are evolutionarily significant. 
In some embodiments, the protein coding sequences are from brain. 

In some embodiments, a nucleotide change identified by any of the methods 
described herein is a non-synonymous substitution. In some embodiments, the 
evolutionary significance of the nucleotide change is determined according to the non- 
synonymous substitution rate (K A ) of the nucleotide sequence. In some embodiments, the 
evolutionarily significant changes are assessed by determining the K A /K S ratio between the 
human gene and the homologous gene from non-human primate (such as chimpanzee), and 
preferably that ratio is at least about 0.75, more preferably greater than about 1 (unity) (i.e., 
at least about 1), more preferably at least about 1.25, more preferably at least about 1.50, 
and more preferably at least about 2.00. In other embodiments, once a positively selected 
gene has been identified between human and a non-human primate (such as chimpanzee), 
further comparisons are performed with other non-human primates to confirm whether the 
human or the non-human primate (such as chimpanzee) gene has undergone positive 
selection. 

In another aspect, the invention provides methods for correlating an evolutionarily 
significant human nucleotide change to a physiological condition in a human (or humans), 
which comprise analyzing a functional effect (which includes determining the presence of a 
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sequences of a non-human primate, preferably a chimpanzee, that is resistant to a particular 
medically relevant disease state, wherein the human protein coding sequence is associated 
with development of the disease; and (b) selecting a non-human polynucleotide sequence 
that contains at least one nucleotide change as compared to the corresponding sequence of 
5 the human, wherein the change is evolutionarily significant. The sequences identified by 
these methods may be further characterized and/or analyzed to confirm that they are 
associated with the development of the disease state or condition. The most preferred 
disease states that are applicable to these methods are cancer and infectious diseases, 
including AIDS, hepatitis C and leprosy. 

10 In another aspect, the invention provides methods for identifying a polynucleotide 

sequence encoding a polypeptide, wherein said polypeptide may be associated with 
resistance to development of AIDS, comprising the steps of: (a) comparing AIDS resistant 
non-human primate protein coding sequences to human protein coding sequences, wherein 
the human protein coding sequences are associated with development of AIDS; and (b) 

15 selecting an AIDS resistant non-human primate sequence that contains at least one 
nucleotide change as compared to the corresponding human sequence, wherein the 
nucleotide change is evolutionarily significant. As indicated herein, these methods can be 
accomplished, for example, by aligning sequences according to their sequence homology 
and identifying a human polynucleotide sequence that comprises at least one unique 

20 nucleotide change over the corresponding polynucleotide sequence of the non-human 
primate, wherein the unique nucleotide change is positively selected according to an 
evolutionary analysis (as described herein). 

In another aspect, methods are provided for identifying an evolutionarily significant 
change in a human brain protein-coding polynucleotide sequence, comprising the steps of 

25 a) comparing human brain protein-coding polynucleotide sequences to corresponding 

sequences of a non-human primate; and b) selecting a human polynucleotide sequence that 
contains a nucleotide change as compared to corresponding sequence of the non-human 
primate, wherein said change is evolutionarily significant. In some embodiments, the 
human brain protein coding nucleotide sequences correspond to human brain cDNAs. 

30 Another aspect of the invention includes methods for identifying a positively 

selected human evolutionarily significant change. These methods comprise the steps of: (a) 
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contains a nucleotide change as compared to corresponding sequence of the human, 
wherein said change is evolutionarily significant. In some embodiments, the non-human 
protein-coding sequences correspond to cDNA. In some embodiments, the sequences 
compared are from brain. Methods used to assess the nucleotide change, and the nature(s) 
of the nucleotide change, are described herein, and apply to any and all embodiments. In 
these methods (as in other like methods described herein), the non-human protein coding 
sequence (and/or the polypeptide encoded therein) may be associated with development 
and/or maintenance of a physiological trait. 

For any embodiment of this invention, the physiological condition may be any 
physiological condition, including those listed herein, such as, for example, disease 
(including susceptibility or resistance to disease) such as cancer, infectious disease 
(including viral diseases such as AIDS); life span; brain function, including cognitive 
function. 

In one aspect of the invention, methods are provided for identifying a 
polynucleotide sequence encoding a human polypeptide, wherein said polypeptide may be 
associated with a physiological condition that is present in human(s), comprising the steps 
of: a) comparing human protein-coding polynucleotide sequences to protein-coding 
polynucleotide sequences of a non-human primate, wherein the non-human primate does 
not have the physiological condition; and b) selecting a human polynucleotide sequence 
that contains a nucleotide change as compared to corresponding sequence of the non-human 
primate, wherein said change is evolutionarily significant. In some embodiments, the 
human protein coding sequence (and/or the polypeptide encoded therein) may be associated 
with development and/or maintenance of a physiological condition. In some embodiments, 
the human protein-coding sequences correspond to cDNA. In some embodiments, the 
sequences compared are from brain. In some embodiments, the physiological condition is 
life span. In other embodiments, the physiological condition is a brain function. In other 
embodiments, the brain function is cognitive function. Methods used to assess the 
nucleotide change, and the nature(s) of the nucleotide change, are described herein, and 
apply to any and all embodiments. 

In other embodiments, methods are provided that comprise the steps of: (a) 
comparing human protein-coding nucleotide sequences to protein-coding nucleotide 

10 
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In another preferred embodiment, a human polynucleotide or polypeptide has 
undergone natural selection that resulted in a positive evolutionarily significant change (i.e., 
the human polynucleotide or polypeptide has a positive attribute not present in non-human 
primates). One example of this embodiment is that the polynucleotide or polypeptide may 
be associated with unique or enhanced functional capabilities of the human brain compared 
to non-human primates. Another is the longer life-span of humans compared to non-human 
primates. The present invention can thus be useful in gaining insight into the molecular 
mechanisms that underlie unique or enhanced human functions, providing information 
which can also be useful in designing agents such as drugs that modulate such unique or 
enhanced human functions, and in designing treatment of diseases or conditions related to 
humans. As an example, the present invention can thus be useful in gaining insight into the 
molecular mechanisms that underlie human cognitive function, providing information 
which can also be useful in designing agents such as drugs that enhance human brain 
function, and in designing treatment of diseases related to human brain. A specific example 
of a human gene that has positive evolutionarily significant changes when compared to 
non-human primates is a tyrosine kinase gene, KIAA 641 . 

Accordingly, in one aspect, the invention provides methods for identifying a 
polynucleotide sequence encoding a polypeptide, wherein said polypeptide may be 
associated with a physiological condition (such as a medically or commercial relevant 
positive evolutionarily significant change). The positive evolutionarily significant change 
can be found in humans or in non-human primates. 

In one aspect of the invention, methods are provided for identifying a non-human 
polynucleotide sequence encoding a polypeptide, wherein said polypeptide may be 
associated with a physiological condition in the non-human primate, including but not 
limited to those physiological conditions listed (and throughout the specification), such as 
susceptibility or resistance to the development of a medically relevant disease state, such as 
an infectious disease (including viral disease, such as AIDS) or cancer. In some 
embodiments, methods are provided that comprise the steps of a) comparing non-human 
primate, preferably a chimpanzee, protein-coding polynucleotide sequences to protein- 
coding polynucleotide sequences of a human, wherein said human does not have the 
physiological condition; and b) selecting a non-human polynucleotide sequence that 

9 
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DISCLOSURE OF THE INVENTION 
The present invention provides methods for identifying polynucleotide and 
polypeptide sequences having evolutionarily significant changes, which are associated with 
physiological conditions, including medical conditions. The invention applies comparative 
5 primate genomics lo identify specific gene changes which may be associated with, and thus 
responsible for, physiological conditions, such as medically or commercially relevant 
evolved traits, and using the information obtained from these evolved genes to develop 
human treatments. The non-human primate sequences employed in the methods described 
herein may be any non-human primate, and is preferably a member of the hominoid group, 
10 more preferably a chimpanzee, bonobo, gorilla and/or orangutan, and most preferably a 
chimpanzee. 

In one preferred embodiment, a non-human primate polynucleotide or polypeptide 
has undergone natural selection that resulted in a positive evolutionarily significant change 
(i.e., the non-human primate polynucleotide or polypeptide has a positive attribute not 

15 present in humans). In this embodiment the positively selected polynucleotide or 

polypeptide may be associated with susceptibility or resistance to certain diseases or with 
other commercially relevant traits. Examples of this embodiment include, but are not 
limited to, polynucleotides and polypeptides that are positively selected in non-human 
primates, preferably chimpanzees, that may be associated with susceptibility or resistance 

20 to infectious diseases and cancer. An example of a commercially relevant trait may include 
aesthetic traits such as hair growth, muscle mass, susceptibility or resistance to acne. An 
example of this embodiment includes polynucleotides and polypeptides associated with the 
susceptibility or resistance to HIV dissemination, propagation and/or development of 
AIDS. The present invention can thus be useful in gaining insight into the molecular 

25 mechanisms that underlie resistance to HIV dissemination, propagation and/or development 
of AIDS, providing information that can also be useful in discovering and/or designing 
agents such as drugs that prevent and/or delay development of AIDS. Specific genes that 
have been positively selected in chimpanzees that may relate to AIDS or other infectious 
diseases are ICAM-K ICAM-2, ICAM-3 and MIP-1-a. 1 7-p-hydroxysteroid 

30 dehydrogenase Type IV is a specific gene has been positively selected in chimpanzees that 
may relate to cancer. 



BNSDOCID: <WO_ 993 9006 A 2 I > 



WO 99/39006 



PCT/US99/01964 



to use these methods for therapeutic goals (as in the present application), but rather to 
pursue narrow academic goals. 

As can be seen from the papers cited above, analytical methods of molecular 
evolution to identify rapidly evolving genes (K A /K s -type methods) can be applied to 
achieve many different purposes, most commonly to confirm the existence of Darwinian 
molecular-level positive selection, but also to assess the frequency of Darwinian molecular- 
level positive selection, to understand phylogenetic relationships, to elucidate mechanisms 
by which new species are formed, or to establish single or multiple origin for specific gene 
polymorphisms. What is clear is from the papers cited above and others in the literature is 
that none of the authors applied K A /K s -type methods to identify evolutionary solutions, 
specific evolved changes, that could be mimicked or used in the development of treatments 
to prevent or cure human conditions or diseases or to modulate unique or enhanced human 
functions. They have not used K A /K S type analysis as a systematic tool for identifying 
human or non-human primate genes that contain evolutionary significant sequence 
changes and exploiting such genes and the identified changes in the development of 
treatments for human conditions or diseases. 

The identification of human genes that have evolved to confer unique or enhanced 
human functions compared to homologous chimpanzee genes could be applied to 
developing agents to modulate these unique human functions or to restore function when 
the gene is defective. The identification of the underlying chimpanzee (or other non-human 
primate) genes and the specific nucleotide changes that have evolved, and the further 
characterization of the physical and biochemical changes in the proteins encoded by these 
evolved genes, could provide valuable information, for example, on what determines 
susceptibility and resistance to infectious diseases, such as AIDS, what determines 
susceptibility or resistance to the development of certain cancers, what determines 
susceptibility or resistance to acne, how hair growth can be controlled, and how to control 
the formation of muscle versus fat. This valuable information could be applied to 
developing agents that cause the human proteins to behave more like their chimpanzee 
homologues. 

All references cited herein are hereby incorporated by reference in their entirety. 
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to molecular evolution studies in primates include hemoglobin, cytochrome c oxidase, and 
major histocompatibility complex (MHC). Nei and Hughes in: Evolution at the Molecular 
Level Sinauer Associates, Sunderland, MA 222-247, 1991; Lienert and Parham (1996) 
Immunol Cell Biol. 74:349-356; Wu etal. (1997) 1 Mol Evol. 44:477-491. Many non- 
5 coding sequences have also been used in molecular phylogenetic analysis of primates. Li, 
Molecular Evolution, Sinauer Associates, Sunderland, MA 1997. For example, the genetic 
distances among primate lineages were estimated from orthologous non-coding nucleotide 
sequences of beta-type globin loci and their flanking regions, and the evolution tree 
constructed for the nucleotide sequence orthologues depicted a branching pattern that is 

10 largely congruent with the picture from phylogenetic analyses of morphological characters. 
Goodman et al. (1990) J. Mol Evol 30:260-266. 

Zhou and Li (1996) Mol Biol Evol 13(6):780-783 applied K A /K S analysis to 
primate genes. It had previously been reported that gene conversion events likely have 
occurred in introns 2 and 4 between the red and green retinal pigment genes during human 

15 evolution. However, intron 4 sequences of the red and green retinal pigment genes from 

one European human were completely identical, suggesting a recent gene conversion event. 
In order to determine if the gene conversion event occurred in that individual, or a common 
ancestor of Europeans, or an even earlier hominid ancestor, the authors sequenced intron 4 
of the red and green pigment gene from a male Asian human, a male chimpanzee, and a 

10 male baboon, and applied K A /K S analysis. They observed that the divergence between the 
two genes is significantly lower in intron 4 that in surrounding exons, suggesting that 
strong natural selection has acted against sequence homogenization. 

Wolinsky et al (1996) Science 272:537-542 used comparisons of nonsynonymous 
to synonymous base substitutions to demonstrate that the HIV virus itself (i.e., not the host 

15 species) is subject to adaptive evolution within individual human patients. Their goal was 
simply to document the occurrence of positive selection in a short time frame (that of a 
human patienf s course of disease). Niewiesk and Bangham (1 996) J Mol Evol 42:452-458 
used the D n /D s approach to ask a related question about the HTLV-1 virus, i.e., what are the 
selective forces acting on the virus itself. Perhaps because of an insufficient sample size, 

i0 they were unable to resolve the nature of the selective forces. In both of these cases, 

although K A /K s -type methods were used in relation to a human virus, no attempt was made 

6 



BNSDOCtD: <WO.„. 9939006A2 I > 



WO 99/39006 



PCT/US99/01964 



Tamura and Nei and phylogenetic analysis to investigate the diversification of MHC class I 
genes in squirrels and related rodents. Parham and Ohta (1 996) Science 272:67-7 r 4 state 
that a population biology approach, including tests for selection as well as for gene 
conversion and neutral drift are required to analyze the generation and maintenance of 
5 human MHC class I polymorphism. Hughes (1997) Mol Biol EvoL 14(l):l-5 compared 
over one hundred orthologous immunoglobulin C2 domains between human and rodent, 
using the method of Nei and Gojobori (d N : d s ratios) to test the hypothesis that proteins 
expressed in ceiis of the vertebrate immune system evolve unusually rapidly. Swanson and 
Vacquier ( 1 998) Science 281:710-712 use d N : d s ratios to demonstrate concerted evolution 
1 0 between the Iysin and the egg receptor for lysin and discuss the role of such concerted 
evolution in forming new species (speciation). 

Due to the distant evolutionary relationships between humans and these lower 
animals, the adaptively valuable genetic changes fixed by natural selection are often 
masked by the accumulation of neutral, random mutations over time. Moreover, some 
15 proteins evolve in an episodic manner; such episodic changes could be masked, leading to 
inconclusive results, if the two genomes compared are not close enough. Messier and 
Stewart ( 1 997) Nature 385:151-154. In fact, studies have shown that the occurrence of 
adaptive selection in protein evolution is often underestimated when predominantly 
distantly related sequences are compared. Endo et al (1996) Mol Biol EvoL 37:441-456; 
20 Messier and Stewart ( 1 997) Nature 385:151-154. 

Molecular evolution studies within the primate family have been reported, but these 
mainly focus on the comparison of a small number of known individual genes and gene 
products to assess the rates and patterns of molecular changes and to explore the 
evolutionary mechanisms responsible for such changes. See generally, Li, Molecular 
25 Evolution, Sinauer Associates, Sunderland, MA, 1997. Furthermore, sequence comparison 
data are used for phylogenetic analysis, wherein the evolution history of primates is 
reconstructed based on the relative extent of sequence similarities among examined 
molecules from different primates. For example, the DNA and amino acid sequence data 
for the enzyme lysozyme from different primates were used to study protein evolution in 
30 primates and the occurrence of adaptive selection within specific lineages. Malcolm et al 
(1990) Nature 345:86-89; Messier and Stewart (1 997). Other genes that have been subject 

5 



BNSDOCfD: <WO 9939006A2 l.> 



WO 99/39006 



PCT/US99/01964 



al. (1995) Proc. R. Soc. Lond B 261 :203-207 use the McDonald & Kreitman test to 
investigate whether adaptive evolution is occurring in sequences controlling transcription 
(non-coding sequences). 

Nakashima et al. (1995) Proc. Natl Acad. Sci USA 92:5606-5609, use the method 
of Miyata and Yasunaga to perform pairwise comparisons of the nucleotide sequences of 
ten PLA2 isozyme genes from two snake species; this method involves comparing the 
number of nucleotide substitutions per site for the noncoding regions including introns (K N ) 
and the K A and K s They conclude that the protein coding regions have been evolving at 
much higher rates than the noncoding regions including introns. The highly accelerated 
substitution rate is responsible for Darwinian molecular-level evolution of PLA2 isozyme 
genes to produce new physiological activities that must have provided strong selective 
advantage for catching prey or for defense against predators. Endo et al ( 1 996) Mol Biol 
EvoL 13(5):685-690 use the method of Nei and GojoborL wherein d N is the number of 
nonsynonymous substitutions and d s is the number of synonymous substitutions, for the 
purpose of identifying candidate genes on which positive selection operates. Metz and 
Palumbi (1996) Mol Biol EvoL 13(2):397-406 use the McDonald & Kreitman test as well 
as a method attributed to Nei and Gojobori, Nei and Jin, and Kumar, Tamura, and Nei; 
examining the average proportions of P n , the replacement substitutions per replacement site, 
and P $ , the silent substitutions per silent site, to look for evidence of positive selection on 
bindin genes in sea urchins to investigate whether they have rapidly evolved as a prelude to 
species formation. Goodwin et al. (1996) Mol Biol Evol 13(2):346-358 uses similar 
methods to examine the evolution of a particular murine gene family and conclude that the 
methods provide important fundamental insights into how selection drives genetic 
divergence in an experimentally manipulatable system. Edwards et al. (1995) use 
degenerate primers to pull out MHC loci from various species of birds and an alligator 
species, which are then analyzed by the Nei and Gojobori methods (d N : d s ratios) to extend 
MHC studies to nonmammalian vertebrates. Whitfield et al. (1993) Nature 364:713-715 
use Ka/Ks analysis to look for directional selection in the regions flanking a conserved 
region in the StfKgene (that determines male sex). They suggest that the rapid evolution of 
SR Y could be a significant cause of reproductive isolation, leading to new species. 
Wettsetin et al. (1996) Mol Biol Evol 13(l):56-66 apply the MEGA program of Kumar, 

4 



BNSDOCiD: <WO 9939006A2 1 > 



WO 99/39006 



PCIYUS99/01964 



products and functional domains can be used to help identify the homologous genes or 
functional domains in humans. For example, the homeo domain with DNA binding 
activity first discovered in the fruit fly Drosophila was used to identify human homologues 
that possess similar activities. 

Although comparison of homologous genes or proteins between human and a lower 
model organism may provide useful information with respect to evolutionarily conserved 
molecular sequences and functional features, this approach is of limited use in identifying 
genes whose sequences have changed due to natural selection. With the advent of the 
development of sophisticated algorithms and analytical methods, much more information 
can be teased out of DNA sequence changes. The most powerful of these methods, 
"K A /K S " involves pairwise comparisons between aligned protein-coding nucleotide 
sequences of the ratios of 

nonsynonymous nucleotide substitutions per nonsynonymous site (K A ) 
synonymous substitutions per synonymous site (K s ) 

(where nonsynonymous means substitutions that change the encoded amino acid and 
synonymous means substitutions that do not change the encoded amino acid). "K A /K s -type 
methods" includes this and similar methods. These methods have been used to demonstrate 
the occurrence of Darwinian molecular-level positive selection, resulting in amino acid 
differences in homologous proteins. Several groups have used such methods to document 
that a particular protein has evolved more rapidly than the neutral substitution rate, and thus 
supports the existence of Darwinian molecular-level positive selection. For example, 
McDonald and Kreitman (1991) Nature 351:652-654 propose a statistical test of neutral 
protein evolution hypothesis based on comparison of the number of amino acid replacement 
substitutions to synonymous substitutions in the coding region of a locus. When they apply 
this test to the Adh locus of three Drosophila species, they conclude that it shows instead 
that the locus has undergone adaptive fixation of selectively advantageous mutations and 
that selective fixation of adaptive mutations may be a viable alternative to the clocklike 
accumulation of neutral mutations as an explanation for most protein evolution. Jenkins et 
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muscle to fat; (7) chimpanzees are more resistant to malaria; (8) chimpanzees are less 
susceptible to Alzheimer's disease; and (9) chimpanzees have a lower incidence of 
atherosclerosis. At the present time, the genes underlying the above human/chimpanzee 
differences are not known, nor, more importantly, are the specific changes that have 
evolved in these genes to provide these capabilities. Understanding the basis of these 
differences between humans and our close evolutionary relatives will provide useful 
information for developing effective treatments for related human conditions and diseases. 

Classic evolution analysis, which compares mainly the anatomic features of 
animals, has revealed dramatic morphological and functional differences between human 
and non-human primates; yet, the human genome is known to share remarkable sequence 
similarities with that of other primates. For example, it is generally concluded that human 
DNA sequence is roughly 98.5% identical to chimpanzee DNA and only slightly less 
similar to gorilla DNA. McConkey and Goodman (1997) TIG 13:350-351. Given the 
relatively small percentage of genomic difference between humans and closely related 
primates, it is possible, if not likely, that a relatively small number of changes in genomic 
sequences may be responsible for traits of interest to human health and well-being, such as 
those listed above. Thus, it is desirable and feasible to identify the genes underlying these 
traits and to glean information from the evolved changes in the proteins they encode to 
develop treatments that could benefit human health and well-being. Identifying and 
characterizing these sequence changes is crucial in order to benefit from evolutionary 
solutions that have eliminated or minimized diseases or that provide unique or enhanced 
functions. 

Recent developments in the human genome project have provided a tremendous 
amount of information on human gene sequences. Furthermore, the structures and 
activities of many human genes and their protein products have been studied either directly 
in human cells in culture or in several animal model systems, such as the nematode, fruit 
fly, zebrafish and mouse. These model systems have great advantages in being relatively 
simple, easy to manipulate, and having short generation times. Because the basic structures 
and biological activities of many important genes have been conserved throughout 
evolution, homologous genes can be identified in many species by comparing 
macromolecule sequences. Information obtained from lower species on important gene 
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METHODS TO IDENTIFY POLYNUCLEOTIDE AND POLYPEPTIDE 
SEQUENCES WHICH MAY BE ASSOCIATED WITH PHYSIOLOGICAL AND 

MEDICAL CONDITIONS 

CROSS-REFERENCE TO RELATED APPLICATIONS 
5 This application claims the priority benefit of U.S. Provisional Application No. 

60/073,263, filed January 30, 1998 and U.S. Provisional Application No. 60/098,987, filed 
September 2, 1998, which are both incorporated herein by reference. 

STATEMENT OF RIGHTS TO INVENTIONS MADE UNDER 
10 FEDERALLY SPONSORED RESEARCH 

Not applicable. 

TECHNICAL FIELD 
This invention relates to using molecular and evolutionary techniques to identify 
15 polynucleotide and polypeptide sequences corresponding to evolved traits that may be 
relevant to human diseases or conditions, such as unique or enhanced human brain 
functions, longer human life spans, susceptibility or resistance to development of infectious 
disease (such as AIDS and hepatitis C), susceptibility or resistance to development of 
cancer, and aesthetic traits, such as hair growth, susceptibility or resistance to acne, or 
20 enhanced muscle mass. 

BACKGROUND ART 
Humans differ from their closest evolutionary relatives, the non-human primates 
25 such as chimpanzees, in certain physiological and functional traits that relate to areas 
important to human health and well-being. For example, (1) humans have unique or 
enhanced brain function (e.g., cognitive skills, etc.) compared to chimpanzees; (2) humans 
have a longer life-span than non-human primates; (3) chimpanzees are resistant to certain 
infectious diseases that afflict humans, such as AIDS and hepatitis C; (4) chimpanzees 
30 appear to have a lower incidence of certain cancers than humans; (5) chimpanzees do not 
suffer from acne or alopecia (baldness); (6) chimpanzees have a higher percentage of 
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